User Guide

About BryteFlow Ingest for Amazon S3 Data Lakes

BryteFlow Ingest for Amazon S3/Redshift Data Lake is a high-performance software that facilitates real-time data capture from OLTP (online transaction processing) systems to analytical platforms/other databases with minimal impact on the source system. BryteFlow Ingest captures the changes in real-time and transfers them to the target system.

BryteFlow Ingest for S3 Data Lakes, does an initial full extract from source and then incrementally merges changes from a source database to Amazon S3  and/or Amazon Redshift  for selected tables. The entire process is fully automated.

Prerequisite

Using Amazon Machine Image (AMI).

System Requirement

  • Ensure connectivity between the server hosting the BryteFlow CDC software and the sources, AWS S3, AWS EMR, and AWS Redshift.
  • Ensure your AWS S3 and AWS Redshift is able to access your EMR instance.
  • Port 8080 should be open on the server hosting the BryteFlow CDC software.
  • Google Chrome browser is required as the internet browser on the server hosting BryteFlow CDC.
  • Java version 7 or higher is required. Java is pre-installed on AMI.
  • For SQL Server source BCP utility should be installed on the server hosting the software.

Licence

To get a valid license go to Configuration tab, then to the License tab and email the “Product ID” to the Bryte support team – support@bryteflow.com

This is not required if using an AMI from the AWS Marketplace.

Dashboard

The Dashboard provides quick access for configuration of BryteFlow CDC.

The source, destination database, file system and email configurations can be easily accessed by clicking on the respective icon.

The Data Sources Transfer Summary shows the number of records transferred.
When hourly is selected you can view the transfer statistics for 24 hours, if daily is selected the monthly statistics are displayed.
Hovering on the bar graph gives the exact number of records transferred.
The configure icon, allows the configuration of the source tables, the type of transfer and the source table primary keys.

Schedule Extract Status displays the schedule status.
Here the scheduler can be controlled, for further detailed configuration click on the configure icon.

The pie chart displays the status of the process; extraction denoted by red, loading denoted by orange and loaded denoted by green.

Setting up Source

Network Setting

Opening Ports in Amazon Console & Windows Server

Open ports On Windows Server

Please perform the steps to allow the inbound traffic to your server, as mentioned in this link: https://technet.microsoft.com/en-us/library/ms345310(v=sql.100).aspx

Open ports on Amazon Console.

Please perform the steps to allow the inbound traffic to your AWS instance, as mentioned in this link: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html

Testing The Connections

Verify if the connectivity to remote services is available.

To test the remote connections you would need telnet utility. Telnet has to be enabled from the control panel in Turn on Windows Feature.

  1. Go to start and then Run and type CMD, and click Ok.
  2.  Type the following at the command prompt.

telnet < IP address or FQDN > Port no

eg: telnet 192.168.1.1 8080

If the connection is unsuccessful then an error will be shown.
If command prompt window is blank only with the cursor, then the connection is successful and the service is available.

Preparing MS SQL Server

Enable Change Tracking for a database in SQL Server

This section applies to MS SQL Server, the version should be higher than 2008.

  • To enable Change Tracking at the database level execute the following query:
    ALTER DATABASE databasename
    SET CHANGE_TRACKING = ON
    (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON
  • Enable Change Tracking at the table level as below execute the following query:
    ALTER TABLE tablename
    ENABLE CHANGE_TRACKING
    WITH (TRACK_COLUMNS_UPDATED = ON)
  • Enable Change Tracking at the database and table level for all the databases and tables to be replicated.

Preparing On-premise Oracle

Enable Change Tracking for an On-Premise Oracle Server

Execute the following queries on Oracle Server to enable change tracking.

  • Oracle database should be in ARCHIVELOG mode.
  • The supplemental logging has to be turned on at the database level. Supplemental logging is required so that additional details are logged in the archive logs.
    To turn on supplemental logging at the database level, execute the following statements:
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  • Alternatively to turn on minimal database supplemental logging execute the following statements:
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; ALTER DATABASE FORCE LOGGING;
  • In Oracle, ensure that supplemental logging is turned on at the table level. To turn on supplemental logging at the table level, execute the following statement:
    ALTER TABLE <schema>.<tablename> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Preparing Oracle on Amazon RDS

Enable Change Tracking for a database on Amazon Oracle RDS

  • In Oracle on Amazon RDS, the supplemental logging should be turned on at the database level.
  • Supplemental logging is required so that additional details are logged in the archive logs.
    To turn on supplemental logging at the database level, execute the following queries.
    exec
    rdsadmin.rdsadmin_util.alter_supplemental_logging(‘ADD’,’ALL’);
  • To retain archived redo logs on your DB instance, execute the following command (example 24 hours)
    exec
    rdsadmin.rdsadmin_util.set_configuration(‘archivelog retention hours’,24);
  • To turn on supplemental logging at the table level, execute the following statement
    ALTER TABLE <schema>.<tablename> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Preparing On-premise MySQL

To prepare MY SQL for change tracking perform the following steps.

To enable binary logging, the following parameters need to be configured as below in my.ini file on MySQL on Windows or in my.cnf file on MySQL on UNIX:

Parameter Value
server_id Any value from 1.
E.g. server_id = 1
 log_bin=<path> Path to the binary log file.
E.g. log_bin = D:\MySQLLogs\BinLog
binlog_format binlog_format=row
expire_logs_days To avoid disk space issues it is strongly recommended not to use the default value (0).
E.g. expire_log_days = 4
 binlog_checksum This parameter can be set to binlog_checksum=none.
BryteFlow does support CRC32 as well
binlog_row_image binlog_row_image=full

Preparing MySQL on Amazon RDS

Enabling Change tracking on MySql on Amazon RDS

To enable change tracking MySQL on Amazon RDS perform the following steps.

  1. In the AWS management console, for MySQL on Amazon RDS create a new DB parameter group and the following parameters should be configured as shown.
  2. The MySQL RDS DB instance should use the newly created DB parameter group for binary logging to be enabled.
binlog_format: binlog_format=row
binlog_checksum : binlog_checksum=none OR CRC32.

Data Types in MS SQL Server

BryteFlow Ingest  for MSSQL Server as a source supports most SQL Server data types. The following table shows the SQL Server source data types that are supported by Ingest application :

SQL Server Data Types

BIGINT REAL VARCHAR (max)
BIT FLOAT NCHAR
DECIMAL DATETIME NVARCHAR (length)
INT DATETIME2 NVARCHAR (max)
MONEY SMALLDATETIME BINARY
NUMERIC (p,s) DATE VARBINARY
SMALLINT TIME VARBINARY (max)
SMALLMONEY DATETIMEOFFSET TIMESTAMP
TINYINT CHAR UNIQUEIDENTIFIER
VARCHAR HIERARCHYID XML

Security Permissions Required on Source

Security for MS SQL

Security requirement for MS SQL

The BryteFlow Ingest data base replication login  user should have VIEW CHANGE TRACKING permission to view the Change Tracking information.

–Review all change tracking tables that are =1 enabled, or =0 disabled
SELECT *
FROM sys.all_objects
WHERE object_id IN (SELECT object_id
FROM sys.change_tracking_tables                                                                                                                                                                                         WHERE is_track_columns_updated_on = 1);

Security for Oracle

Security Requirements for the CDC user in Oracle

The Oracle user running BryteFlow CDC must have the following security privileges:

  1. Select access on all tables to be replicated
  2. Select * from V$ARCHIVED_LOG should return records.
    If no records are returned, select access on V_$ARCHIVED_LOG should be provided, or check if the database is in ACHIVELOG mode.
  3. The following security permissions should be assigned to the user
    CREATE SESSION
    Select access on V_$LOGMNR_CONTENTS
    Select access on V_$LOGMNR_LOGS
    Select access on ANY TRANSACTION
    Select access on DBA_OBJECTS
    Execute access on DBMS_LOGMNR

Run the following grant statements for <user> for the above requirements

GRANT SELECT ON V_$ARCHIVED_LOG TO &lt;user&gt;;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO &lt;user&gt;;
GRANT EXECUTE ON DBMS_LOGMNR TO &lt;user&gt;;
GRANT SELECT ON V_$LOGMNR_LOGS TO &lt;user&gt;;
GRANT SELECT ANY TRANSACTION TO &lt;user&gt;;
GRANT SELECT ON DBA_OBJECTS TO &lt;user&gt;;

Security for MySQL

The CDC user id must have the following privileges:

  1. Replication client, and Replication Slave.
  2. Select privileges on the source tables designated for replication.
  3. Execute the following queries to grant permissions to a MySQL user.
CREATE USER 'bflow_cdc_user' IDENTIFIED BY '*****';
GRANT SELECT, REPLICATION CLIENT, SHOW DATABASES ON *.* TO<br />
bflow_cdc_user IDENTIFIED BY '******';
GRANT SELECT, REPLICATION slave, SHOW DATABASES ON *.* TO<br />
bflow_cdc_user IDENTIFIED BY '*****'

Verification of Source

Verification of MS SQL source

Verify if the MS SQL Source is setup correctly.

  1. To verify if change tracking is already enabled on the database run the following SQL queries. If a row is returned then Change Tracking has been enabled for the database
    SELECT *
    FROM sys.change_tracking_databases
    WHERE database_id=DB_ID('databasename')
  2. To check if change tracking is already enabled on a table in the database run the following SQL. This will list all the tables for which Change Tracking has been enabled
    use databasename;
    SELECT sys.schemas.name as Schema_name,
    sys.tables.name as Table_name
    from sys.change_tracking_tables
    join sys.tables on sys.tables.object_id =
    sys.change_tracking_tables.object_id
    join sys.schemas on sys.schemas.schema_id = sys.tables.schema_id

Verification of Oracle source

Verify if the Oracle is set up correctly

To verify if Oracle is setup correctly for change detection execute the following queries.

Condition to be checked SQL to be executed Result expected
Is ArchiveLog mode enabled
SELECT log_mode from V$DATABASE;
ARCHIVELOG
Is Supplemental logging turned on at database level?
SELECT, supplemental_log_data_min from V$DATABASE;
YES
Is Supplemental Logging turned on at table level?
SELECT log_group_name, table_name, always,log_group_type
FROM dba_log_groups;
RESULT <log group name>, <table name>, ALWAYS, ALL COLUMN LOGGING

Creating Amazon Services

Creating An EC2 System

Please refer AWS documentation on how to create EC2 System.

The documentation can be access here https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/EC2_GetStarted.html

Creating S3 Bucket

Please refer AWS documentation for creating S3 bucket.
The documentation can be accessed here.   http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html

Configuring EMR Cluster

  1. Login to your AWS account and select the correct AWS region where your S3 bucket and EC2 container are located.
  2. Click on the services drop down in the header.
  3. Select EMR under Analytics or you can search for EMR.
  4. Click on create cluster button.
  5. In Create Cluster – Quick Options please type in Cluster Name (Name you will identify the Cluster with)
    keep the Logging check box selected, the S3 folder will be selected by default. Launch mode should be Cluster.
  6. Under Software configuration select release emr-5.14.0 and in Applications select Core Hadoop: Hadoop 2.8.3 with Ganglia 3.7.2, Hive 2.3.2, Hue 4.1.0, Mahout 0.13.0, Pig 0.17.0, and Tez 0.8.4
  7. Hardware configuration- Please select Instance type and number of Instances you want to run.
  8. Security and access –
    Please select the EC2 key pair that you want to use with the EMR Cluster. This key will be used to SSH into the Cluster. Permission should be set to default.
  9. Click Create Cluster. (Provisioning of a cluster can take up to 15-20 min).

Configuring CloudWatch/SNS

Configuring CloudWatch

Bryteflow Ingest supports connection to Cloudwatch Logs and Cloudwatch Metrics. This can be used to monitor the operation of Bryteflow Ingest and integrate with other assets leveraging the AWS infrastructure.

AWS Cloudwatch Logs can be used to send logs of events like ‘load completion’ or ‘load failure’ from Bryteflow Ingest. Cloudwatch Logs can be used to monitor error conditions and raise alarms.

AWS Cloudwatch Metrics can be used to monitor metrics for Bryteflow Ingest like time taken to execute jobs and the time jobs wait for the EMR to become available. These metrics can be monitored on the AWS dashboard and alarms can be set when the metrics cross a predefined threshold.

These connections are optional and are used only if you decide to interface with Cloudwatch Logs and Metrics.

Steps to setup Cloudwatch Logs, Metrics and SNS’s :

  1. Place the CWLogs.zip and CWMetrics.txt files in the config directory. Edit the config.xml file to add the following sections.

<Site>

<Param name=”id” value=”CWLOG”/>
<Param name=”log_group” value=”BryteFlow”/>
<Param name=”log_stream” value=”[Stream Name]”/>
<Param name=”region” value=”[region]”/>
<Param name=”s3key” value=”[IAM Key]”/>
<Param name=”s3skey” value=”[IAM Secret Key]”/>

</Site>

<Site>

<Param name=”id” value=”CWMETRIC”/>
<Param name=”region” value=”[region]”/>
<Param name=”s3key” value=”[IAM Key]”/>
<Param name=”s3skey” value=”[IAM Secret Key]”/>

</Site>

Configuring SNS

Bryteflow Ingest supports connection to AWS SNS. This can be used to send SNS messages on significant events like load completion or failure. This can be used to coordinate other tasks on the AWS infrastructure or for generating alarms and notifications.

AWS SNS is a publish/subscribe notification service that can be used to reliable route notifications to multiple endpoints. This can be used to trigger event-driven computing and workflows.

Place the SNS.zip file in the config directory. Edit the config.xml file to add the following sections.

<<Site>

<Param name=”id” value=”SNS”/>
<Param name=”region” value=”[region]”/>
<Param name=”s3key” value=”[IAM Key]”/>
<Param name=”s3skey” value=”[IAM Secret Key]”/>
<<Param name=”topic” value=”arn:aws:sns:[sns topic]”/>

</Site>

Starting & Stopping BryteFlow CDC

If you are using the AMI, BryteFlow CDC will be preinstalled as a service in Windows.

Alternatively, you can install the service by executing the following command using the Command Prompt(Admin).

  1. Navigate to the directory of the installation.
  2. service.exe --WinRun4J:RegisterService

To Start BryteFlow CDC 

  1. Start the BryteFlow CDC service using Windows Services or  Windows Task Manager
  2. Type the URL in the Chrome browser localhost:8080/bryteflow/wv#

To Stop Bryteflow CDC 

  1. Stop the BryteFlowCDC service or place an empty file called “exit” in the installation directory. This will indicate, to the replication process  that it can exit gracefully when it has completed the load for the current extraction set.
  2. Replication processes can also be ended immediately by going to Task Manager
    -> Processes -> service.exe – and selecting “End Task”

Configure Source Connections

Configure MS SQL Source Connections

To configure MS SQL as the source database perform the following steps.

  1. In the database type select “Microsoft SQL Server (Change Tracking)” from the drop-down list.
  2. In the database host field please enter the IP address or FQDN of the database server.
  3. In the database port field type in the port number on which database server is listening on. The default port for MS SQL is 1433.
  4. In the database name field type in the name of your database e.g. DEMODB.
  5. Enter a valid User Id on MS SQL server that will be used with CDC.
  6. Enter password; confirm.
  7. Click on Test Connection button and Apply button, to test the connection and save the details.

Configure Oracle Source Connections

To configure Oracle as the source database perform the following steps.

  1. In the database type select “Oracle Change Logs” from the drop-down list.
  2. In the database host field please enter the IP address or FQDN of the database server.
  3. In the database port field type in the port number on which database server is listening on. The default port for Oracle is 1521.
  4. In the database name field type in the name of your service on which your Oracle is listening e.g. DEMODB.
  5. Enter a valid User Id on Oracle server that will be used with Bryteflow Ingest.
  6. Enter password; confirm.
  7. Click on Test Connection button and Apply button, to test the connection and save the details.

Configure Settings for Destination Database

To configure Destination as Amazon S3 perform the following steps:

EMR/S3

  1. Enter Database Type: To use Amazon S3 as the destination,  select EMR/S3,
  2. Enter Database Host: Enter the id of the EMR cluster e.g.  j-1L50XZWUXXXXX.
  3. Click on Test Connection and then Apply to test the connection and save the details.

To configure Destination as Amazon Redshift perform the following steps:

AWS Redshift (EMR)

  1. Enter Database Type: To use Amazon S3 and Amazon Redshift as your destination, select “AWS Redshift (EMR)”
  2. Enter Database Host: Enter the endpoint for Amazon Redshift
    eg. xxxx.cayraetXXXXXX.us-west-2.redshift.amazonaws.com
  3. Enter Database Port: 5439 is the default port for Redshift
  4. Enter Database Name: eg dev
  5. Enter User Id: This is the Redshift user id that will load the schemas, tables, and data
    automatically to Redshift.
  6. Enter Password; confirm
  7. Click on Test Connection and then Apply to test the connection and save the details

Configure Destination File Systems

File System Configurations

To Configure S3 as the filesystem perform the following steps.

  1. Select File System as AWS S3 with EMR from the drop-down.
  2. In the bucket name field, enter the bucket name that you have created on AWS S3.
  3. In the Delta Directory and Data Directory field, type in the name of the directories you want to use.
  4. Enter the AWS EMR instance ID mentioned on your AWS EMR Cluster details page.
  5. In EMR Region and S3 Region select the correct regions from the drop-down list.
  6. Enter S3 key and S3 secret key for accessing the S3 service.
  7. If you are using the KMS id you and enter it instead of the S3 key and S3 secret key.
  8. You can click on the Test Directory button to check if you are able to connect to AWS S3 directory, click on Apply to save the changes.

Configure Data Sources

Configure Data Sources by selecting individual tables

To select the table for transferring to destination database on AWS Redshift or AWS S3 bucket perform the following steps.

  1. Expand the Database.
  2. Browse to the table you want to be synced with AWS Redshift database or AWS S3.
  3. Select the checkbox next to the table and then click on the table.
  4. On the right-hand side pane, select the type of transfer for the table ie. By Primary Key or By Primary Key with History.
  5. In the Primary Key column, select the Primary Key for the table by checking the checkbox next to the column name.
  6. You can also mask a column by checking the checkbox. By masking a column, the selected column will not be transferred to the destination database.

Transfer Types

Full Extract – > Full Replace
Extract the entire table from the source database and load to target database.

Full Extract – > By Primary Key
Extract the entire table from the source database to target and specify a Primary Key

Full Extract -> By Primary Key with History
Extract the entire table from the source database, and populate the table in the destination database with a history of all changes. The end_dt is set to ‘9999-12-31’ if the record is current.

Delta Extract -> By Primary Key
Extract only the delta changes, and use the defined primary key to populate the changes. The table structure is identical to the source database.

Delta Extract -> By Primary Key with history
Extract only delta changes, and use the defined primary key to populate the changes with history. The table structure is the same as the source with extra columns eff_dt and end_dt to maintain SCD Type 2 history. The end_dt is set to ‘2999-12-31’ if the record is current.

Note: Delta Extract by Primary Key and Delta Extract by Primary Key with history are the transfer types that should be used for most extracts where primary key columns are known.

Redshift schemas
Tables are loaded to the equivalent schema in Redshift. For example, the source table is in the test_db database and dbo schema called the ‘customer’ will be loaded to the test_db_dbo schema in Redshift. The table will be called ‘customer’. Source data types are converted to equivalent data types in Redshift.

Configure Email Notification

Configure settings for Email notification

To configure email updates to sent perform the following steps

  1. Choose Mail Type: SMTP using TLS from the drop-down
  2. In the Host, field type in the address of your SMTP server.
  3. In the Port field, type in the port number on which the SMTP server is listening.
  4. In the user id field type your complete email address from which will authenticate with the SMTP server.
  5. Enter Password for the email; confirm.
  6. In Send From, enter the email id from which the email will be send from, it has to be a valid email address on the server.
  7. In Send To field enter the email address to which the notifications are sent to.
  8. Click on Test Connection and then Apply to test the connection and save the settings.

Replicate with BryteFlow CDC

Once the configuration of the sources has been completed, complete the following steps to start replicating your data from the source to the destination.

  • Make sure a valid Licence key has been applied.
  • Stop the batch file and start it back again for the newly applied license key to take effect.
  • Go back to the browser and use the same URL again, the connection information should be available.
  • Go to Data Sources tab and select the table and they type of extraction.
  • Select the Primary key and Mask click Full Extract and Apply button. This will prompt you to go to Schedule tab and click Sync new tables button.

Understanding Extration Process

Extraction Process

Understanding Extraction.

Extraction has two parts to it.

  1. Initial Extract.
  2. Delta Extract.

Initial Extract.

An initial extract is done for the first time when we are connecting a database to BryteFlow Ingest software. In this extract, the entire table is replicated from the source database to the destination (AWS S3 or AWS Redshift).

A typical extraction goes through the following processes. Below example shown is the extraction from MS SQL server as source and Amazon S3 bucket and destination.

Extracting 1
Full Extract database_name:table_name
Info(ME188): Stage pre-bcp
Info(ME190): Stage post-bcp
Info(ME260): Stage post-process
Extracted 1
Full Extract database_name:table_name complete (4 records)
Load file 1
Loading table emr_database:dbo.names with 4 records(220 bytes)
Transferring null to S3
Transferred null 10,890 bytes in 8s to S3
Transferring database_name_table_name to S3

Delta Extract.

After the initial extract, when the database is replicated to the destination, we do a delta extract. In delta extracts, only the changes on the source database are extracted and merged with the destination.

After the initial extraction is done all the further extract are Delta Extracts (changes since the last extract.)

A typical delta extracts log file is shown below.

Extracting 2
Delta Extract database_name:table_name
Info(ME188): Stage pre-bcp
Info(ME190): Stage post-bcp
Info(ME260): Stage post-process
Delta Extract database_name complete (10 records)
Extracted 2
Load file 2
Loaded file 2

First Extract

Extracting Database for the first time.

Keep all defaults. Click on Full Extract.

The first extract always has to be a Full Extract. This gets the entire table across and then the delta’s are populated periodically with the desired frequency.

Schedule Extract

To configure extracts to run at a specific time perform the following steps.

  1. In case of Oracle Automatic is preselected and other options are disabled by default.
  2. For SQL server you can choose the period in minutes.
  3. A daily extraction can be done at a specific time of the day by choosing hour and minutes in the drop-down.
  4. Extraction can also be scheduled on specific days of the week at a fixed time by checking the checkboxes next to the days and selecting hours and minutes in the drop-down.
  5. Click Apply to save the schedule.

Add a new table to existing Extracts

After database have been selected for extraction and they are replicating. If a need arises to add a new table to extraction process then it can be done by following steps.

  • Slide the Scheduler to “Off”.
  • Go to the ‘Data Sources’ tab.
  • Select the new table(s).
  • Configure the table to specify the transfer type.
  • Select Primary Key columns (masked columns are optional).
  • Go to the ‘Schedule’ Tab, Configure Tab.
  • Click on Sync New Tables

This will include the new table for a full extract and then resume with deltas for all the configured tables

Resync data for Existing tables

If the Table transfer type is Primary Key with History, to resync all the data from source, perform the following steps

  • Slide Scheduler to “Off”.
  • For Resync Data on ALL configured tables – Go to Schedule tab, Configure tab and Click on Full Extract.
  • For Resync Data on selected tables, go to Data Sources, select table and click on Full Extract. Then go to Schedule Extract, Configure and select Sync New tables.

General Configuration

General configuration setting of Source Database

 

Web Port: The port on which the BrtyeFlow CDC server will run on.

Max Catchup Log: The number of Oracle archive logs will be processed at one instance.

Minimum Interval between Catchups: The minimum minutes between catchup batches.

Default transfer type: Default transfer option applied when not defined at the table level.

Handle Oracle raw columns: Handle raw columns by converting to hex string instead of ignoring as CHAR(1).

General configuration settings of Destination Database

Max Update: Combine updates that exceed this value.

Loading Threads:  Number of Redshift loading threads.

Destination Schema:  Ignore source schema and put all tables in this schema on destination

Staging Schema:  Schema for staging tables.

Retaining Staging Table:  Retain staging tables.

Source start date:  Column name for source date.

History End Date Column:  Column name for history end date

History End Date Value:  End date used for history.

Avoid DB Prefix for Schema:  Don’t use DB as part of schema prefix for SQL Server.

Number of Data Slices:  Number of slices to split data file in to.

EMR S3 Compression Method:  Compression method (gzip or bzip2)

General configuration settings of S3

Keep S3 Files: Retain files in S3 after loading into AWS Redshift.

Use SSE:  Store in S3 using SSE (server-side encryption).

S3 Proxy Host: S3 proxy host name.

S3 Proxy Host Port:  S3 proxy port.

S3 Proxy user ID:  S3 proxy user id.

S3 Proxy Password:  S3 proxy password.

Licence Copy

General configuration settings of Licence Copy

A valid license key can be obtained by emailing the Product ID to the support team at support@bryteflow.com

Monitoring

You can monitor the progress of your extracts by going to Log tab in the menu.

The log files are stored in the log folder of your BryteFlow CDC installation.
The path to log  file is <install dir of bflowcdc>\log\sirus*.log

Appendix: Bryte Events for AWS CloudWatch Logs and SNS

BryteFlow Ingest supports connection to AWS Cloudwatch Logs, Cloudwatch Metrics and SNS. This can be used to monitor the operation of Bryteflow Ingest and integrate with other assets leveraging the AWS infrastructure.

AWS Cloudwatch Logs can be used to send logs of events like load completion or failure from Bryteflow Ingest. Cloudwatch Logs can be used to monitor error conditions and raise alarms.

Below are the list of Events that BryteFlow Ingest pushes to AWS CloudWatch Logs console  and for AWS SNS :

Bryte Events Description
LogfileProcessed Archive log file processed (Oracle only)
TableExtracted Source table extract complete SQL Server and Oracle (initial extracts only)
ExtractCompleted Source extraction batch is complete
TableLoaded Destination table load is complete
LoadCompleted All destination table loads in a batch is complete
HaltError Unrecoverable error occurred and turned the Scheduler to OFF
RetryError Error occurred but will retry

Below is the detail for each of the Bryte Events :

  Event : LogfileProcessed

Attribute

 

Is Metric(Y/N)?

 

Description

type N “LogfileProcessed”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
fileSeq N File sequence
file N File name
dictLoadMS Y Time taken to load dictionary in ms
CurrentDBDate N Current database date
CurrentServerDate N Current Bryte server date
parseMS Y Time taken to parse file in ms
parseComplete N Timestamp when parsing is complete
sourceDate N Source date
  Event : TableExtracted

Attribute

 

Is Metric(Y/N)?

 

Description

type N “TableLoaded”
subType N Table name
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
tabName N Table name
success N true/false
message N Status message
sourceTS N Source date time
sourceInserts Y No. of Inserts in source
sourceUpdates Y No. of Updates in source
sourceDeletes Y No. of Deletes in source
  Event : ExtractCompleted

Attribute

 

Is Metric(Y/N)?

 

Description

type N “ExtractCompleted”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
jobType N “EXTRACT”
jobSubType N Extract type
success N Y/N
message N Status message
runId N Run Id
sourceDate N Source date
dbDate N Current database date
fromSeq N Start file sequence
toSeq N End file sequence
extractId N Run id for extract
tableErrors Y Count of table errors
tableTotals Y Count of total tables
  Event:TableLoaded

Attribute

 

Is Metric(Y/N)?

 

Description

type N “TableLoaded”
subType N Table name
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
tabName N Table name
success N true/false
message N Status message
sourceTS N Source date time
sourceInserts Y No. of Inserts in source
sourceUpdates Y No. of Updates in source
sourceDeletes Y No. of Deletes in source
destInserts Y No. of Inserts in destination
destUpdates Y No. of Updates in destination
destDeletes Y No. of Deletes in destination
  Event : LoadCompleted

Attribute

 

Is Metric(Y/N)?

 

Description

type N “LoadCompleted”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
jobType N “LOAD”
jobSubType N Sub type of the “LOAD”
success N Y/N
message N Status message
runId N Run Id
sourceDate N Source date
dbDate N Current database date
fromSeq N Start file sequence
toSeq N End file sequence
extractId N Run id for extract
tableErrors Y Count of table errors
tableTotals Y Count of total tables
  Event : HaltError

Attribute

 

Is Metric (Y/N)?

 

Description

type N “HaltError”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
message N Error message
errorId N Short identifier
  Event : RetryError

Attribute

 

Is Metric (Y/N) ?

 

Description

type N “RetryError”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
message N Error message
errorId N Short identifier

FAQ

Suggest Edit