From the course: Data Engineering Foundations

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

Sources of data extraction

Sources of data extraction

- [Instructor] As we know, E in ETL stands for extract. To get started with developing an ETL pipeline, let's first look at the common data sources and data types that we have to deal with. So what do we mean by extracting data? Very roughly, this means extracting data from persistent storage into memory. This persistent storage could be a file on Amazon S3, for example, or a SQL database, or a web API. It is the necessary stage before we can start transforming the data, and the sources here may vary. Now first of all, we can extract data from plain text files. Now, these are the files that are generally readable by people. They can be unstructured, like an article from the Forbes magazine. Alternatively, these can be flat files where each row is a record, and each column is an attribute of the records. In the later, we represent data in a tabular format. So typical examples of flat files are comma or tab separated…

Contents