Change Data Capture
What is Change Data Capture?
When data is replicated from a source database to a target that could be another database, data warehouse or cloud data storage object, you have to keep in mind that changes to the data in the source will need to be captured and replicated to the destination in order to keep data consistent and trustworthy. Change Data Capture or CDC is the process that makes this possible. CDC captures the changes in the source data and updates only the data in the destination that has changed. This does away with the tedious task of bulk load updating and enables real-time integration of data.
What’s special about BryteFlow’s Change Data Capture?
You cannot trust your data without data verification
BryteFlow data replication uses Change Data Capture (CDC) technology with database transaction logs. It continuously updates only rows and inserts that have changed and avoids time-consuming batch processing that is notorious for slowing down processes. BryteFlow’s CDC is done using transaction logs – the gold standard for data replication and the fastest, most efficient way to capture changes to data. Further, it has zero impact on the source system and does not impede workday operations. BryteFlow’s CDC features an optimized in-memory engine with Amazon EMR that continuously merges new change files with existing data in the Amazon S3 bucket so your data always stays current and updated.
See what the buzz is all about. Get in touch with us for a FREE Trial.
Methods of capturing data changes
Issues with Full Refresh
As a way to update changes in source data you could opt for a full refresh of the data in the destination. This is horribly time consuming and a single point of failure (don’t say you weren’t warned!). Organisations usually rely on a night batch load for refreshing their data from sources as a full refresh. If there is a spike in data, or network or infrastructure issues, the full extract fails, leaving the business with no reports the next day. Trying to do this during the day will have your people pulling out their hair in frustration as it overloads the source and disrupts normal business workloads. Hence the next window is again nightly and this can be a nightmare that compounds.
Issues with using timestamps or incrementing sequences:
Organisations sometimes try to solve data updating requirements by using record create or update dates on the record – assuming that the source even logs these. In most cases, these dates are populated incorrectly, leading to incorrect data being extracted or more importantly being completely missed. There is a high probability that updates do not change the update date for the records and hence are not taken into account. But by far the biggest failing of this approach is that deletes can never be captured – leading to inconsistent, incorrect data on the destination.
These can only capture inserts. Updates and deletes can never be captured. And even though organizations maintain that they never update or delete their records, they will occasionally wreak havoc on the downstream systems due to the architectural approach being adopted.
The case for Change Data Capture
Change Data Capture using database transaction logs is by far the most enterprise grade mechanism to get access to your data from database sources. It has zero impact on the source and data can be extracted real-time or at a scheduled frequency, in bite-size chunks and hence there is no single point of failure. Change data capture technology gets data from database logs and gets only the deltas. But MOST importantly, it can capture all inserts, updates and deletes, making the data trustworthy at the destination.
BryteFlow Ingest as a data replication tool that uses log based CDC and is unique in this regard where the throughput outpaces most competition, and needs minimal security privileges. You are assured of low impact, high throughput and guaranteed delivery of data.
Take a first hand look at our Change Data Capture. Get a FREE Trial
CDC for Multiple Destinations
Change Data Capture for Amazon S3
BryteFlow’s next-gen CDC technology easily keeps up with changes happening in the source destination and updates the same on Amazon S3 where your data resides. By leveraging the power of other AWS services, BryteFlow converts AmazonS3 from a simple cloud storage object to a superb analytics environment.
Change Data Capture for Amazon Redshift
Now you can rest easy when you are delivering data to Amazon Redshift. BryteFlow’s Change Data Capture technology creates large numbers of different files for new record inserts, updates and deletes. It continuously merges new change files with existing data on Amazon Redshift so your data stays always current and updated.
Change Data Capture for Snowflake
In order to take adavantage of Snowflake’s real-time data pipelines, you need a CDC technology that can keep up. BryteFlow’s CDC technology captures changes in your source database in real-time and moves only the changed data to Snowflake merging it with existing data continuously. BryteFlow’s real-time monitoring also alerts you if data is missing so your data is always complete.
BryteFlow Ingest & XL Ingest
BryteFlow Ingest is our data replication tool extraordinaire. It uses a proprietary technology to replicate huge volumes of data from multiple sources at dizzying speeds to Amazon S3 in real-time. While BryteFlow Ingest replicates large databases effortlessly, XL Ingest is intended for huge petabyte databases.
- Completely codeless and automated data replication.
- Ingest data automatically in real-time from hundreds of sources.
- Access data immediately with real-time replication of your source in the data lake.
- Efficiently manage transactional data and sync changes continuously.
- Get a range of data conversions out of the box including Typecasting and GUID data type conversion.
- Retrieve data from any point on the timeline with timestamping feature.
- Automatic catch-up from network dropout.