From the course: Oracle DB Cloud Database Migration and Integration Workshop

Data integration

Welcome to Oracle University's lesson: Data Integration. I'm your host, Desiree. And let's get started. In this lesson, I'm going to show you the basics of data integration and transformation, Oracle Data Integrator, followed by its architecture, design, and platform. Before we go into Oracle Data Integrator, let's talk about the basics of what the data integration and transformation process is. Data at source is rarely in a form useful for target applications. For example, data gets generated in various ERP systems, marketing, CRM, etcetera. Nowadays, there are other data sources as well from social media, IoTs, etcetera. Now, in order to make use of this data, it needs to be moved to some other place where we're going to do something else with it. For example, analytics. In this case, we are transporting the data into a different physical location. The data as it is, is not very useful. It needs to be transformed into something like maybe for warehousing purposes. In that case, data may need to be joined with each other, transformed into something else in a different data type or different attributes so that it can be made use of. This process is called the data integration process. Oracle has this product, Oracle Data Integrator or ODI to do this. This is actually quite a mature product. It has been around for some time. It was originally ELT innovator. It is a very innovative way of doing data integration. It excels in bulk data performance. If you have any need for bulk data transformations or movements, ODI is the best tool. Also, it leverages existing hardware. By doing that, it reduces the implementation costs as well. Once again, this product is very mature. We have thousands of customers. It is very flexible to work with, but we will talk about these characteristics shortly. Any data integration tool needs to have a heterogeneous connectivity to various sources and targets to make it really useful. And ODI has lots of connectives like that. We will talk about that as well in the presentation. Now, let's look at the architecture of why ODI is different. In conventional ETL, the architecture, data is extracted from the sources, loaded into a transform server, and from there, it's loaded into a target wherever data needs to be moved to. This is conventional ETL architecture where a transformation server is needed in between. ODI is the next generation of architecture called E-LT architecture, where data is extracted. The data has to come from some other sources, but instead of going into a transform server in between, it is loaded into the target itself and transformation is executed in the target. Through this, we are able to get various benefits. First, it leverages set based transformations. This means we are not doing a row-by-row processing. We can run a set base SQL or some other kinds of segments for the targets and that gives us performance. Our network hops also get reduced. You can see that we don't have to do two times loading. Only one hop is needed here. Extracting from sources and loading into the target. And that improves the performance of loading as well. The last benefit is that it takes advantage of existing infrastructure. We don't have a different transformation server. We are leveraging the target itself as a transformation server. That's a big advantage. ODI also separates the design time versus implementation time. During design time, we essentially use a very simple paradigm where you define what you want. During implementation, we use a knowledge modules to actually create the physical implementation of the map. This separation makes it very easy for developers to create any new mapping. It reduces the learning curve because you don't have to know anything about implementation. It also shortens the implementation time and is easier to maintain because the knowledge modules are maintained separately in the UI, where data integration mapping and certifying are maintained separately. Let's look at how it works. We are seeing a logical flow of mapping where there are some sources and some transformations are happening. In this case, filter, lookups, joins, aggregation, and data is loading into the target. This is how the user wants the data to flow from sources to their targets. And that's what we define it in ODI studio. Now, during execution, the same flow is converted into physical implementation. So in this case, you can see that the joins, lookups, and aggregation are happening in Autonomous Database. And from there, data is loaded into Cassandra. The same mapping can also be executed into Hadoop cluster and big data where the spark code is generated depending on what you want to do. Now, you can see that the mapping is still the same, but the physical implementation is different in one environment versus the other. The user does not have to know anything about how the mapping will be executed depending on what execution environment we're planning to use. Oracle Data Integrator can also be used in real-time data integration by integrating with GoldenGate. Now, let's see how GoldenGate works. GoldenGate is very good with replicating data from one set of sources in this case, like how you have your application tables to a target. You can set up GoldenGate to do an almost real-time replication. As in when changes happen in the source, they get loaded into the target. Now, after setting up GoldenGate, we can start writing into journalizing tables. These journalizing tables can be used as a source for Oracle Data Integrator. By having this integration between GoldenGate and Oracle Data Integrator, you can set up small micro batch feeds to load into your warehouse or any other kind of target you want. Now, where data is transform in a much more complex way, which is not possible by GoldenGate, you can run them in a very high frequency. We call them micro-batches. You can take advantage of GoldenGate for that. Basically, you get benefits for real-time replication and at the same time, very high frequency updates of your targets. You get the best of both worlds. This concludes our lesson: Data Integration. Thank you for tuning in.

Contents