Chris Castiglione Co-founder of Console.xyz. Adjunct Prof at Columbia University Business School.

Python vs SQL – What’s the Difference?

4 min read

Python vs SQL - What’s the Difference?

Python vs. SQL | Pros and Cons

Approximately twenty years ago, there were only a handful of programming languages that a software engineer would need to know well. Even back then, Structured Query Language, or SQL, was the go-to language when you needed to gain quick insight on some data, fetch records, and then draw preliminary conclusions that might, eventually, lead to a report or to writing an application. 

Nowadays, data comes in many shapes and formats, not necessarily synonymous with relational databases any longer. Data exists within CSV files, plain text, and on the web, as well as in many other different formats. It is in this maze of data that Python’s broad toolset of libraries shine.

With a vast set of helper libraries and associated platforms, Python is an excellent programming language for quick, and iterative data exploration. Python’s set of libraries include everything from data visualization to statistical analysis, making it convenient for developers to jump into data analysis and begin identifying patterns.

Use cases for SQL and Python

SQL is designed to query and extract data from tables within a database. SQL is good at allowing you as a developer, to seamlessly join (or merge) several data together. Combining data from multiple tables is a key strength.

However, SQL isn’t designed for manipulating or transforming data into other formats. Types of higher-level data manipulation that are common within data science, such as statistical analysis, regression tests, and time-series data manipulation, are is very difficult to achieve using SQL exclusively.

Python, on the other hand, has a well-known data analysis Library called Pandas, which has been specially designed for data analysis and manipulation.

Python is particularly well suited for structured (tabular) data which can be fetched using SQL and then require farther manipulation, which might be challenging to achieve using SQL alone. This leads to the following question: When should you use Python vs. SQL?

SQL and Python Differences

Use SQL to retrieve the essential data required for the analysis, and then use specialized Python libraries to process it.

Let’s take a look at an example using SQL: 

Say we have the following database table with hotel information called hotels. This is a small sample dataset of the information contained within the table. This table could include thousands or millions of rows (entries), each one representing a hotel.

Name City Street Price Currency Stars Phone
1 Via dei Valeri Rome Via Dei Valeri 1 54

USD

5

108 Minutes Hostel Moscow Malaya Ordynka Street 5/6 Bld.4 App.3 24

USD

1

7(495) 795-4259

1st Arbat Hostel at Novinsky Moscow Novinsky Bulvar 1/2 App 64 26

USD

1

7(967)0130047

A Plus Hostel Prague Na Florenci 14 13/33 14

USD

3

Aalborg Hotel Amsterdam Amsterdam Sarphatipark 106 39

USD

3

106 – 1073

Adagio Access Vanves Porte de Chatillon Paris 5 rue Aristide Briand, 53

USD

3,5

33140955600

Advisor Hotel Hanoi Hanoi 51 Nguyen Sieu Street Hoan Kiem 13

USD

2

39935766

Afrian House Hotel Cairo 15 Emad El Din Street 11

USD

1

Agon Aldea Hotel Berlin Berlin Bulowstrasse 19-22 22

USD

3

Agustina Serviced Apartments Quezon City Manila No. 9 18th St. Brgy Kalusugan (New Manila) 27

USD

3

2098647-25

Let’s compare how we could query this dataset using SQL versus Python using Pandas.

Type of Action SQL Python using Pandas
Retrieve all the hotels SELECT * FROM hotels   hotels
Retrieve the first 3 hotels SELECT * FROM hotels LIMIT 3 hotels.head(3)
Get the names of hotels in Paris SELECT Name FROM hotels WHERE City = ‘Paris hotels[hotels.City == ‘Paris‘].Name

As we can see, so far, both are quite similar. However, things start to become more interesting when we need to do more complex queries.

Examples when Python is easier to use than SQL

Let’s say we did some preliminary data querying, and now have a dataset table called hotels_by_city, that contains the number of hotels per city, which looks as follows:

City Hotel_Count
Paris 567
Moscow 482

With this new dataset, let’s say we want to order results by Hotel _Count and only select the top ten cities with the largest count. We could do this as follows:

SQL Python using Pandas
SELECT City FROM hotels_by_city ORDER BY Hotel_Count DESC LIMIT 10 hotels_by_city.nlargest(10, columns=’Hotel_Count‘)

You can start to see that the Pandas syntax is becoming easier to grasp and write, as the examples of the data being retrieved, become a bit more complicated.

Let’s add a layer of complexity, so we now need to determine using this new dataset table, which is the next ten hotels, after the top ten.

SQL Python using Pandas
SELECT City FROM hotels_by_city ORDER  BY Hotel_Count DESC LIMIT 10 OFFSET 10 hotels_by_city.nlargest(20, columns=’Hotel_Count‘).tail(10)

Notice how the Pandas syntax remains almost unaltered as complexity increases, whereas the SQL syntax becomes more complex to read.

Another way to think about differences between Python and SQL is that SQL provides the setup and Python takes you from there. SQL allows you to start with a dataset which is structured into tables. Once you have the tables, you use Python to branch off different analyses in various directions.

Types of SQL

MySQL (pronounced MY Sequel) is the most popular type of SQL database. The super-geek way to refer to this type of SQL database as a relational database management system (or RDBMS). More or less, that means it’s a database with many tables that link together in some way. All WordPress installations by default are using MySQL.

PostgreSQL (often just referred to as “Postgres”) is similar to MySQL but is known to be more durable. However, setting up Postgres for newbies can often be difficult, so if you are new to this, MySQL is probably you’re best bet.

SQLite (pronounced Sequel Lite) is a very portable, and compact database. SQLite is usually best used for testing and very, very simple applications.

When to choose Python vs. SQL?

SQL contains a much simpler and narrow set of commands compared to Python. In SQL, queries almost exclusively use some combination of JOINS, aggregate functions, and subqueries functions.

Python, by contrast, is like a collection of specialized Lego sets, each with a specific purpose. Each library has custom pieces for building a particular type of a particular thing: Pandas for data analysis, PyPDF2 for PDF manipulation, Scikit-learn for machine learning, and much more.

Which language to learn first Python or SQL?

We think that the best place to start is by learning  SQL. SQL is an essential tool for any kind of data retrieval from relational databases, even if you’re primary job has little or nothing to do with data analysis.

Even if the SQL query is ten times longer then the equivalent Python script, it feels easier to do then doing the equivalent in Python because it reads like English. Remember, learning is more laborious than typing, and takes more time.

Once you have a reasonable knowledge of SQL, that you can at least write a query that joins two tables together, attempt to rewrite some of that same logic in Python using Pandas.

Comparison Chart

Python SQL
Founded Created in 1991 by Guido Van Rossum Created by Donald D. Chamberlin and Raymond F. Boyce in 1974
Pros A versatile, dynamic, easy, broad scope A very easy learning curve, English-like syntax.
Cons Steeper learning curve then SQL Mostly applicable only in relational databases (with a few exceptions)
Web Libraries A library for almost anything None, SQL is just SQL.
Types   Python 2 and Python 3 are the two most common types of Python MySQL, SQL Server, PostgreSQL, SQLite
Usage Google

YouTube

DropBox

Almost every high-tech company.

Furthermore, Python is widely used in many organizations for data-science and exploration.

Nearly every website on the internet uses a SQL database for its back-end.

Thanks to Eduardo Freitas for his help co-writing and proofing this piece! 

Learn to Code Comment Avatar
Chris Castiglione Co-founder of Console.xyz. Adjunct Prof at Columbia University Business School.