Loading Data into Postgres using Python and CSVs
Data storage is one of (if not)themost integral parts of a data system. You will find hundreds of articles online detailing how to write insane SQL analysis queries, how to run complex machine learning algorithms on petabytes of training data, and how to build statistical models on thousands of rows in a database. The only problem is: no one mentions how you get the data stored in the first place.
Whether you are a data analyst, data scientist, data engineer, or even a web developer, it is important to know how to store and access your data. In this blog post, we are going to focus on a type of data storage called a relational database. Relational databases are the most common storage used for web content, large business storage, and, most relevant, for data platforms.
Specifically, we'll be focusing on Postgres(or PostgreSQL), one of the biggest open source relational databases. We like Postgres due to its high stability, ease of accessbility in cloud providers (AWS, Google Cloud, etc), and the fact it is open source! Using the Python library,psycopg2
, we will run through an example of how you can create your own table from scratch and then load a data set into a local running Postgres server.