ETL with Arbutus: Better Quality at a Fraction of the Price

At many organizations, critical data resides on mainframe computers that could be of great value if it were more easily accessible. Traditionally, the best way to access non-relational legacy data like this was to perform an extract, transform, and load (ETL) to a data mart.

Arbutus offers two alternatives that provide improved accuracy, faster access and fewer headaches – at a fraction of the cost – of using any popular ETL solution on the market.

The Problem: Mainframe Data ETL = Business Counter-Intelligence

The ETL process is costly, time-consuming, and invites in errors. That’s not an acceptable solution for today’s executives and decision makers, who require real-time access to accurate data in order to make intelligent choices.

Here are just some of the risks associated with ETL:

  • Most business intelligence (BI) and data profiling tools cannot directly or accurately access legacy data, so project managers start blind, guided only by source documentation and subject matter experts, if available
  • Undocumented changes to the data format or field contents can be hard to identify and address, causing delays
  • Data quality can be worse than expected, necessitating extensive ETL testing and data cleansing
  • Legacy data can include unexpected or undocumented transaction types, or other anomalies, resulting in changes to the ETL and mapping
  • Because the ETL process is time-consuming for both people and computers (i.e., mainframe cycles), often just slices of data are accessed, instead of all the necessary data, which can necessitate repeating the entire process all over again in order to obtain data missed the first time around

The bottom line is simply this: for many project managers, accessing and using legacy data is synonymous with risk, complexity and cost.

The Solution: Minimize the Challenges, Risks & Costs Associated with Mainframe Legacy Data

Arbutus technology offers two alternatives for a dramatic improvement over standard ETLs:

1) Use Arbutus prior to an ETL to reduce errors, project hours and risk    

Arbutus uses a read-only copy of the actual source data – not test data – so you can identify and correct data quality issues immediately. This data-driven methodology reduces the costs and risks of ETLs.

2) Use Arbutus instead of performing an ETL, and get equivalent results at a fraction of the cost

Arbutus can read source data, such as relational or non-relational legacy mainframe, AS/400 (iSeries), or Oracle data, providing SQL and query access within standard Windows apps (e.g., Excel, Access, or Crystal Reports), so you don’t have to perform ETL.

Advanced Capabilities

Arbutus technology is easy to implement and deploy, giving organizations flexibility in how they provide access to their legacy data, whether it’s on the mainframe (zSeries, MVS), AS/400 (iSeries), or Windows. Below are some of the advanced capabilities of Arbutus that make data access and manipulation faster, easier, and more economical than other solutions on the market.

  • Arbutus Language – commands specifically designed for legacy data analysis and manipulation
  • Data Definition Wizard – an expert system for defining metadata step-by-easy-step
  • Procedures – allow even complex processes to be triggered automatically
  • Groups – for executing multiple commands during one pass of the data to optimize performance
  • Semantic Layer – a bridge between the physical expression of the data and the logical representation seen by users
  • Data Relations – supports star schema modeling, even across disparate data sources
  • Virtual Columns – dynamically process any calculation, such as harmonizing keys across systems
  • Conditional Columns – can dynamically take on a value based on conditional logic
  • Automated Metadata Conversion – automatically handled by the Data Definition Wizard
  • Complex File-Type Support – ensures even the most complex data files can be processed
  • Multiple Record-Type Support – filter a single record type or combine data from multiple record types

 

Arbutus Language

COMMANDS SPECIFICALLY DESIGNED FOR LEGACY DATA ANALYSIS AND MANIPULATION

The Arbutus technology contains a wide variety of commands specifically designed for data analysis and manipulation, including:

Data Output Commands

  • REPORT – provides for basic reporting, with automatic formatting
  • EXPORT – creates data in most popular PC-based formats

Data Manipulation Commands

  • EXTRACT – creates a subset of a table, not unlike SELECT WHERE
  • SUMMARIZE – creates subtotals by key value, not unlike GROUP BY
  • INDEX – logically re-arranges the data, based on one or more keys
  • SORT – creates a physically re-arranged copy of the data, based on one or more keys
  • RELATION – allows multiple files to be connected in a “star schema” style
  • JOIN – combines two dissimilar files based on common key(s)
  • MERGE – combines two files with identical structures, based on common key(s)
  • SAMPLE – creates a statistical subset of a table

Data Analysis Commands

  • CLASSIFY – one of the most powerful analytic commands in the Arbutus command set. It allows a table to be grouped and totaled on a key field, even when the table is not physically arranged in that order. It does this without arranging the table first, by maintaining the totals independently from the file. The result is most Sort/Summarize operations can be replaced by a simple Classify.
  • CROSS TABULATE – extends CLASSIFY to a two-dimensional slice, without sorting
  • COUNT – determines the number of rows that match any criteria
  • TOTAL – accumulates numeric column(s) that match any criteria
  • STATISTICS – provides an overview of numeric column(s)
  • STRATIFY – determines the distribution of values for numeric column(s)
  • AGE – provides distribution information based on dates
  • BENFORD – provides a specialized statistical analysis of the data
  • SEQUENCE – confirms that a table is arranged in the order it is expected to be
  • GAPS – identifies any gaps in sequences that are expected to be complete
  • DUPLICATES – identifies any duplicates in sequences that are expected to be unique values
  • VERIFY – tests for physical data corruption in source files

 

 

Data Definition Wizard

AN EXPERT SYSTEM FOR DEFINING METADATA STEP-BY-EASY-STEP

Often, the most difficult part of accessing legacy data is simply identifying the source data and creating appropriate metadata. This is where our Data Definition Wizard comes in.

The Data Definition Wizard is actually an expert system packaged as a Wizard. It leads you through a step-by-step process of selecting the data and then defining the metadata for each table definition you choose to create.

Every step along the way, this expert system analyzes the choices that are available, and makes a recommendation based on the data and the previous choices made. In this way, even users who are not expert in data issues can create appropriate metadata. The entire process is presented as a Windows Wizard, with help available at each stage to streamline the task.

When creating links to DB2, you are able to use ‘Speed Search’ to reduce the table list displayed. You may also join multiple DB2 tables into one virtual table.

When creating links to IMS, you may choose either the single segment selected, or the full path to the root.

The Data Definition Wizard also adds a Windows-like file selection dialog, to make the selection of mainframe files as easy as possible for non-mainframe users.

 

 

Procedures

ALLOW EVEN COMPLEX PROCESSES TO BE TRIGGERED AUTOMATICALLY

Arbutus supports the creation of stored procedures, which allow for repeated or automatic execution of Arbutus commands. Like SQL, the Arbutus language contains English-like commands, but has been enhanced with specific data analysis and manipulation capabilities.

Procedures may be created manually or graphically. Arbutus procedures may contain any Arbutus commands, can call other procedures, and may even allow interaction with users.

Automatically Trigger Complex Processes with Procedures
Arbutus procedures may be automatically invoked whenever a particular table is referenced. This capability allows even the most complex processes to be triggered automatically, as required.

Procedures in Detail

  • Sub-procedures allow encapsulation of behavior
  • Looping allows for iterative processing of data groups
  • Variables allow advanced calculations that can contain persistent values
  • Grouping allows multiple commands to be executed in one pass of the table
  • Flow control allows commands, groups of commands, or sub-procedures to be conditionally executed
  • Dialogs allow a run-time user interface to the procedure, to enter parameters or specifications

Example Procedure
Below is an example of a simple procedure that uses a table (Sales) to create a new table (Large_NY) that displays all sales over $10,000 in New York, sorted in descending order of size:

OPEN Sales
SORT ON Sales_Amt D TO Large_NY IF Sales_Amt>10000 AND State=”NY”

 

 

Groups

FOR EXECUTING MULTIPLE COMMANDS DURING ONE PASS OF THE DATA TO OPTIMIZE PERFORMANCE

Arbutus procedures support the concept of grouping multiple commands for execution in one pass of the file. There are two primary uses for this technology:

  • First, dramatic performance improvements are possible, as the data access component, which is shared across the commands, typically makes up the largest part of the processing load
  • Second, for more complex file types with multiple record types, your processing of a particular record type may depend on separate, and different, processes based on other preceding record types

Groups also support the conditional execution of sub-groups, using an “if-then-else” style. The result is an easy implementation of case-style logic that separates processing steps for each record type or condition. This supports the accessing of even the most complex file structures.

 

 

Semantic Layer

A BRIDGE BETWEEN THE PHYSICAL EXPRESSION OF THE DATA AND THE LOGICAL REPRESENTATION SEEN BY USERS

A powerful semantic layer provides the bridge between the physical expression of the data and the logical representation that users see. This is used to increase the value that can be derived from the data, by making the logical content of the data more obvious.

For example, rather than referring to tables and columns by their (often cryptic) internal system names, each can be supplied with a name that is most meaningful to the user community. This allows the users to more readily understand the data content, and most easily work with it.

There are a number of capabilities that are significant to Arbutus’s semantic layer, including:

  • Any table may be given a name that is most meaningful to the users
  • Each user only sees tables that are relevant to their needs
  • Each column in a table can be given a name that is most meaningful to the users
  • Each user only sees columns that are relevant to their needs
  • Each user group can have their own independent table, and column names and subsets
  • Virtual Columns may be created that augment the physical information in the table

 

 

Data Relations

SUPPORTS STAR SCHEMA MODELING, EVEN ACROSS DISPARATE DATA SOURCES

Data Relations allow you to define logical relationships between disparate data sources that share a common key or keys. The implementation of this technology is in a dimensional modeling, or “star schema” format, which matches fact and dimension files based on keys.

This approach is similar to technologies implemented in most RDBMS (relational database management system) applications, but the difference is Arbutus directly accesses the original files, without importing. When the keys in two files do not match precisely, you can employ virtual columns to dynamically transform and harmonize the keys.

When the host system data already contains native key information, Arbutus will automatically utilize this to optimize the Data Relations processing. For QSAM or other flat files, where native keys do not exist, Arbutus can create custom indices to optimize performance.

Any of the many file types that Arbutus can access can be directly related, in any combination, allowing for unlimited flexibility in combining the information from different applications or programming languages. Using this technology, disparate data sources may be easily combined.

Since Data Relations can be defined in any combination, you can model your data architecture regardless of its complexity. Both “Star Schema” and “Snowflake Schema” modeling are supported; therefore, any complexity of model can be implemented. As the fact tables are processed, the related dimension tables are automatically synchronized.

 

 

Virtual Columns

DYNAMICALLY PROCESS ANY CALCULATION, SUCH AS HARMONIZING KEYS ACROSS SYSTEMS

In addition to supporting an unparalleled range of physical field types, Arbutus also supports virtual columns. Virtual columns allow the result of any calculation(s) to be presented as a column in a table. Virtual columns are implemented by Arbutus purely as definitions, stored in the table’s metadata. Virtual columns operate automatically and dynamically, with no procedural code required. There is no physical data actually created, and no pre-processing requirements or delays. Once defined, virtual columns are automatically exposed through the table definition and can be referenced just like any physical column.

What is a Virtual Column?

If you have an inventory table, for example, that contained physical columns specifying the quantity on hand (QTY) and the unit price (PRICE), then you can define a simple virtual column VALUE with the definition QTY*PRICE. As soon as the column is defined, you can refer to VALUE as if it were a physical column in the table, as Arbutus does not differentiate between physical and virtual columns. While the above example is very simple, virtual columns can contain the value of any expression, regardless of complexity.

In addition, virtual columns can be multi-valued. For this type of virtual column, you may specify any number of tests which will be sequentially evaluated until one is true or the list is exhausted. Each test is paired with a separate value expression. The value of the virtual column is determined by which test (if any) succeeds, on a row-by-row basis, providing comprehensive testing capabilities. Virtual columns may refer to any value or column, including other virtual or related columns. This results in the ability to establish even complex data relationships through virtual columns. Any calculation or relationship inherent in your data can be modeled as a virtual column, allowing Arbutus to not only access the data itself, but also the meaning implicit in that data.

Virtual columns are useful for a variety of applications. They can:

  • Encapsulate business logic calculations
  • Expand coded values, such as department codes, into relevant names
  • Provide semantically relevant names for various business logic conditions
  • Act as automatic triggers to implement specific business logic
  • Identify error conditions in the table

 

 

Conditional Columns

CAN DYNAMICALLY TAKE ON A VALUE BASED ON CONDITIONAL LOGIC

Conditional columns − virtual or physical − can be created that have conditionally triggered values, with any level of complexity on the triggers. This capability can be used to embed any procedural or business logic directly into a table’s definition metadata.

Conditional columns are particularly useful to support the definition of multiple record type files, which are quite common in legacy application environments. When a file contains multiple record types, Arbutus allows the column definitions for all record types to be concurrently active. Columns that relate to only one, or specific, record types automatically contain null values when processing other record types, ensuring the integrity of the results. Each column effectively has a separate criterion for validity, and therefore an unlimited number of record types or conditions are supported.

In addition to taking on conditional values, conditional columns may also be static. With static conditional columns, rather than taking on a null value, the last valid column value automatically propagates forward to subsequent records until another valid value is encountered.

Record types tend to fall into two broad categories: transaction records, whose information only relates to the record at hand, or header records, whose information relates to the next set of records until another header is encountered. Through the use of static conditional columns, Arbutus can effectively flatten a multiple record type file containing any number of types into a single virtual record.

 

 

Automated Metadata Conversion

AUTOMATICALLY HANDLED BY THE DATA DEFINITION WIZARD

Metadata conversion is automatically handled by the Arbutus Data Definition Wizard during the table definition process. Most language or system constructs are automatically converted from the host environment directly into metadata compatible with Arbutus. This includes:

COBOL

  • Copybook is directly read from host, or may be downloaded and read locally
  • Fixed-length or variable-length file definitions
  • All supported data types and data formats
  • Level 88 fields, including single or multi-valued fields
  • REDEFINES of entire record, group of fields, or individual field
  • OCCURS, including nested OCCURS
  • Multiple copybooks may be concatenated and treated as:
    • Separate record definitions
    • A single record definition

PL/1

  • Copybook is directly read from host, or may be downloaded and read locally
  • Fixed-length or variable-length file definitions
  • All supported data types and data formats
  • Supports both F-format and V-format input record formats
  • Structures
  • Arrays
  • Multiple copybooks may be concatenated and treated as:
    • Separate record definitions
    • A single record definition

DB2

  • Table names
  • All column attributes are automatically converted, including:
    • Name
    • Length
    • Type
    • Decimals

IMS

  • Automatic DBD processing to determine and convert:
  • Segment names
  • Hierarchical paths
  • Field information, including:
    • Keys
    • Name
    • Type
    • Length
    • Decimals

Easytrieve

  • Reads field information from source, including:
    • Name
    • Type
    • Start
    • Length
    • Decimals

Where your data definitions are stored in an unsupported format, or are not available electronically, the Data Definition Wizard supports a graphically assisted field definition process.

While not fully automated, virtually any metadata that is available in electronic form can be converted to metadata compatible with Arbutus. This includes data dictionary reports, metadata stored in PC files, such as a Word document, or documentation pages printed to a file, like SMF record layouts from IBM documentation.

For any metadata, whether created manually or automatically, the administrator may:

  • delete any definitions never to be used
  • add any definitions, as required
  • rename any definitions, to support ease of use
  • temporarily hide any definitions, which may be required in the future

 

 

Multiple Record-Type Support

FILTER A SINGLE RECORD TYPE OR COMBINE DATA FROM MULTIPLE RECORD TYPES

In addition to supporting complex file structures and a host of field types, Arbutus has specific technologies to support multiple record type files, and to present their contents as a virtual flat record. Multiple Record Type processing allows Arbutus either to filter a single record type from a file or to combine data from multiple record types into a single unified tabular view for processing.

Arbutus technology includes the automatic recognition of any number of different record types and the selective conversion of appropriate data, all with no user intervention required.

Record types tend to fall into two broad categories: transaction records, containing information that only relates to the record at hand, or header records, with information that relates to the next set of records until another header is encountered. Through the use of conditional columns, Arbutus dynamically propagates information from header records to the subsequent records they relate to, without the need to transform the data. The result is, all relevant data is available for each record.

 

Scenarios

The Arbutus technology allows any Windows application to directly connect to almost any IBM mainframe (zSeries) or AS/400 (iSeries) legacy data, without the high cost, complexity, or prolonged implementation cycle typical of legacy data access. This includes non-relational (flat) files on the mainframe, in addition to VSAM, IMS, ADABAS, or DB2. Data from any of these sources can appear as a relational table in a unified database. To a Windows application, this virtual database is indistinguishable from a data warehouse.

Arbutus’ data access technology offers compelling benefits when addressing challenges in a variety of different application areas. Following is a summary of some important application areas, together with common challenges that Arbutus technology addresses.

 

BI (Business Intelligence)

THE CHALLENGE: INEXPENSIVE ACCESS TO VAST LEGACY DATA STORES

One of the major challenges in business intelligence (BI) today is incorporating all the data necessary to manage your business effectively. While your data repository contains a wealth of information, there is still significant value locked in your relatively inaccessible, non-warehoused legacy data.

The problem is, when you require legacy data to be added to the repository, it is often difficult to build a business case, due to the high costs. As a result, you often don’t get access to the data you need.

  • What you need is access to these vast reserves of data, without the typical costs or delays.

THE SOLUTION

Arbutus connects even your most complex mainframe legacy data to Windows-based applications. We offer a range of products, depending on your particular needs, including Arbutus Connect, middleware that allows your existing Windows tools to directly access mainframe data.

 

Data Conversion

THE CHALLENGE: EFFICIENTLY CONVERTING YOUR MAINFRAME LEGACY DATA

Data conversion is one of the more challenging tasks an IT department can face. The most common problem is the lack of appropriate tools to get the job done. Many organizations rely on labor-intensive processes, such as COBOL programs, which can be difficult to use, expensive, or both.

The problem in many cases is the data conversion project is for a specific narrow situation, and the benefits don’t justify any of the costly alternatives available.

  • You need a tool that is easy to learn and use, powerful enough to read any mainframe data source, flexible enough for any situation, and can be deployed cost-effectively: Arbutus Migrate.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe and AS/400 data. Arbutus Migrate is a simple, powerful, timely and affordable solution designed for the challenges of data conversion.

Arbutus Migrate includes:

  • A Windows-based query and conversion tool specifically designed for mainframe data access
  • A mainframe server which provides comprehensive data access to virtually any mainframe data source, including DB2, IMS and even the most complex VSAM and sequential (QSAM) flat files
  • Our unique LegacyLink™ ODBC driver, which provides Windows applications (e.g., Excel, Access and Crystal Reports), direct access to your mainframe data, often eliminating the need for data conversion entirely.


In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. This can dramatically enhance your ability to provide the data in the required format.

Data Migration

THE CHALLENGE: ACCESSING YOUR LEGACY DATA WITH MODERN TOOLS

Systems conversion is one of the more challenging tasks an IT department can face. Unfortunately, more often than not, it is forced upon you. You may have acquired a company with incompatible systems, your existing application may no longer meet the organization’s needs, or there may have been a mandated decision from above. In any case, you need to convert systems and time is of the essence.

The problems are many:

  • you have incomplete information on the source system
  • you have inadequate tools to access the source data
  • the developers may no longer be with the organization
  • there are undocumented business rules that only exist in the applications themselves
  • the data quality does not meet the standards or requirements of the new system

These problems are exacerbated when dealing with mainframe legacy data.

  • If only the source system used a relational database to store the data, you would then have ready access to the data, and could apply modern tools to the task.

THE SOLUTION

Arbutus specializes in accessing non-relational mainframe and AS/400 legacy data. Arbutus Migrate is a simple, powerful, timely and cost-effective solution, designed for the challenges of data migration.

Arbutus Migrate is made up of two components: The first, LegacyLink, is middleware that allows your present Windows tools to directly access virtually any legacy data. The second component, Analyzer is a separate Windows end-user tool that allows an extensive range of analytics to be performed on your source data.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. This can dramatically enhance your ability to analyze and migrate your data.

Data Mining

THE CHALLENGE: ALLOWING TOOLS ACCESS TO ALL YOUR LEGACY DATA

As the saying goes: “garbage in, garbage out”. The quality of your data mining results are often proportional to the quality (and quantity) of the data supplied. The problem is, your data mining tools are generally limited to accessing your data warehouse. They can’t get at the vast stores of operational and other data in your legacy systems.

The reason for this is cost. The cost to access legacy systems often means that a business case is difficult to build. But the problem is, the absence of this legacy data can severely hamper, or even cripple, your data mining efforts.

  • Imagine if any data you wanted to access could be made available, quickly and inexpensively, directly into your data mining tools.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe and AS/400 data. Arbutus Connect is middleware specifically tailored to make that data accessible by Windows-based applications. The solution is simple, powerful, timely and cost-effective.

Arbutus reads virtually any mainframe data directly and presents it as a virtual relational database, almost indistinguishable from a data mart. This virtual database is instantly accessible by any Windows-based application, without any of the complexities you might expect.

This means that you can integrate all your mainframe data with your data mining tools, without the need to extend your warehouse, or incur the costs or delays.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. You are not limited to merely exposing the data, but can actually fit it to your needs.

DQM (Data Quality Management)

THE CHALLENGE: ENSURING ETL TRANSFORMS DON’T MASK DATA QUALITY PROBLEMS

Your Data Quality Management (DQM) tools are providing ever increasing capabilities. The problem is, they are not equipped to read your legacy data. Legacy systems have been in place for years, sometimes decades, and often do not enforce the level of data quality that we would insist on for a new system.

Ironically, this means that the systems most in need of data quality reviews are the least accessible.

Often, the critical data from these systems has been replicated in your data warehouse. You might assume that testing this copy of the data is sufficient, but unfortunately this data may not just be a copy. ETL transformations are often required to load the data warehouse and these can very easily (and often do) obscure important data quality issues.

For example, a GENDER field may have been populated in the data warehouse with logic like:

  • if field = M, then gender = M, else gender = F

If the legacy system allowed blanks or “?” to be entered, they would be lost in the data warehouse, and would appear as female. You can imagine how a more complex transform could mask more serious errors.

Even this trivial example could lead to poor business decisions based on the gender of your customers.

  • What is needed is a way to apply your DQM tools directly to the legacy systems, and identify issues at the source. As well, such a solution could allow you to access other data not in the warehouse, either for reference or as additional DQM opportunities.

THE SOLUTION

Arbutus connects even your most complex mainframe legacy data to Windows. We offer a range of products, depending on your particular needs:

  • Arbutus Connect is middleware that allows your existing Windows tools to directly access the mainframe data.
  • Arbutus Integrate allows you to integrate legacy system testing into your standard processes.
  • Arbutus Query is an end-user Windows application that provides comprehensive query and reporting capabilities. This can be used to supplement your existing processes.
  • Arbutus Migrate allows you to control systems conversion, to ensure data quality issues are found.

Data Warehousing

THE CHALLENGE: HOW TO VALIDATE REQUIREMENTS BEFORE IMPLEMENTATION

Depending on your definition of a failure, data warehouses have a significant to alarmingly high failure rate. Data mart models can improve your odds, but not as much as you might hope.

In many cases, problems occur because the requirements change during implementation. In others, the initial request is incomplete or inadequate, but this is usually not discovered until after the implementation, when users start working with the warehoused data.

These issues are not unique to data warehousing, but they can be exacerbated by the typically long implementation cycle. One of the major factors affecting the length of implementation can be integration with mainframe legacy data.

  • What is needed is a way of quickly prototyping your data warehouse or data mart, so you can validate the requirements before you start your implementation. Prototyping is an approach often taken in other software development areas.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe and AS/400 data. Arbutus Connect is middleware specifically tailored to move that data directly to a Windows environment. The solution is simple, powerful, timely and cost-effective.

Arbutus reads virtually any mainframe data directly and presents it as a virtual relational database, almost indistinguishable from a data mart. This virtual database is instantly accessible by any Windows-based application, without any of the complexities you would typically encounter.

This means that you can prototype much of your data mart very quickly. Not only can you validate the request, but you can work with the system for a period of time to discover any additional requirements or to monitor query patterns.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. You are not limited to merely exposing the data, but can dynamically fit it to your needs.

EII (Enterprise Information Integration)

THE CHALLENGE: INTEGRATION OF DISPARATE MAINFRAME DATA SOURCES

While your organization has vast amounts of data, it is often scattered across countless disparate systems. This distribution of data presents major challenges when attempting to integrate the data into a unified whole. These challenges are compounded for mainframe legacy data, which tend to be difficult and expensive to access.

One solution is to warehouse the most important data from each of these systems. Unfortunately, this process is both costly and time-consuming. As well, due to the costs of the data repository, the resulting system usually limits the available data to those elements considered most important. As a result, this approach is very inflexible and has difficulty meeting the evolving needs of your business.

  • EII (Enterprise Information Integration) is a new market segment that specifically addresses the integration of these disparate data sources. It creates a virtual database that in theory can integrate all the disparate data. Unfortunately, the majority of tools in this segment let you down when addressing mainframe legacy data.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe and AS/400 (iSeries) data. Arbutus Connect is middleware specifically tailored to make that data accessible by Windows-based applications. The solution is simple, powerful, timely and cost-effective.

Arbutus reads virtually any mainframe data directly and presents it as a virtual relational database, almost indistinguishable from a data mart. This virtual database is instantly accessible by any Windows-based application, without any of the complexities you would typically encounter.

This means that you can integrate all your mainframe data into a virtual data warehouse, without the need to actually build a warehouse, or incur the costs or delays.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. You are not limited to merely exposing the data, but can actually fit it to your needs.

ETL (Extract, Transform, Load)

THE CHALLENGE: A COST-EFFECTIVE MEANS TO ACCESS & TRANSFORM MAINFRAME LEGACY DATA

When integrating legacy data into your data warehouse or data mart, an unfortunate fact is, many of the “standard” extract, transform and load (ETL) solutions are complex, time-consuming, and costly. For these reasons, they are typically implemented only for the data sources judged to be the most critical, or sadly, the most easily accessible.

This situation is particularly true with respect to your legacy mainframe systems. Given the costs, you can only make a business case for the most valuable data, and even then you have to wait for the process to run its course.

But what about all the data elements that have value, but not enough to justify the expense of modifying your ETL processes?

  • What is needed is a solution that lowers the boundaries to accessing your legacy data, one that is quicker and more cost-effective, so that it is easier to make a case for accessing all of the data that you need, when you need it.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe and AS/400 data. Arbutus Connect is middleware specifically tailored to move legacy data directly to the Windows platform. The solution is simple, powerful, timely and cost-effective. In cases in which your target is not Windows-based, Arbutus can still read the source data, but instead provides it in any standard format for loading.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. You are not limited to merely exposing the data, but can dynamically fit it to your needs.

Exception Reporting

THE CHALLENGE: SUPPORTING BUSINESS LOGIC TRIGGERS, WITHOUT DATA REPOSITORIES

Exception reporting and logic triggers are powerful tools in managing a business. The problem is, it can be very difficult, expensive, and time-consuming to utilize data that is not in some type of data repository.

Since repositories are expensive, only the most important data is typically included. This may hamper your ability to implement business rules, because the required data is missing.

  • What is needed is a means to utilize these legacy data sources quickly and inexpensively. This would allow many more opportunities to utilize these powerful techniques.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe (zSeries) and AS/400 (iSeries) data. Arbutus Connect is middleware specifically tailored to make that data accessible by Windows-based applications. The solution is simple, powerful, timely and cost-effective.

Arbutus reads virtually any mainframe data directly and presents it as a virtual relational database, almost indistinguishable from a data mart. This virtual database is instantly accessible by any Windows-based application, without any of the complexities you would typically encounter.

This means that you can integrate all your mainframe data with your tools, without the need to extend your warehouse, or incur the costs or delays.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. You are not limited to merely exposing the data, but can actually fit it to your needs.

Query & Analysis

THE CHALLENGE: SUPPORTING DEPARTMENTAL INFORMATION REQUIREMENTS WITHOUT DATA MARTS

Your ability to manage is directly related to the quantity and quality of the information at your disposal. While your organization stores vast amounts of information, not all of it is readily accessible. The data warehouse or data marts may be easiest to access, but they typically contain only a small fraction of all your corporate data.

The problem is, creating a data repository is a costly and time-consuming process, both to implement and maintain. As a result, to keep the scope manageable, only the most valuable data is included.

These compromises mean that even when the repository is implemented, they can still fail to meet all your query and analysis needs.

  • What you need is an easily implemented, cost-effective data access solution that provides the data you need, when you need it.

THE SOLUTION

Arbutus connects even your most complex mainframe legacy data to Windows-based applications. We offer a range of products, depending on your particular needs, including Arbutus Connect, which is middleware that allows your existing Windows tools to directly query the mainframe data.

Virtual Data Marts

THE CHALLENGE: HOW TO ADDRESS SMALL DATA MARTS EFFECTIVELY

While data marts require significantly less infrastructure than data warehouses, they are still relatively expensive and time-consuming to implement and maintain. This is particularly true for integration with mainframe legacy data. Often, the high costs mean that you can’t justify the project at all.

Even when you can justify the costs, the required approval and implementation delays often mean you don’t get the system operating on a timely basis. These delays can be just as deadly to a data mart project.

  • What is needed is a method of implementing a data mart without the costs and delays typically encountered; in other words, a virtual data mart, or virtual data warehouse.

THE SOLUTION

Arbutus specializes in accessing non-relational legacy mainframe and AS/400 data. Arbutus Connect is middleware specifically tailored to move that data directly to Windows. The solution is simple, powerful, timely and cost-effective.

You can install and configure the software on your mainframe in a matter of days. As your needs change, you can add new tables in a matter of minutes, with no delays or additional costs.

In addition to accessing the data, Arbutus offers semantic and transformation layers that allow the implementation of virtually any data interface or business rule. You are not limited to merely exposing the data, but can dynamically fit it to your needs.

As an added benefit, you access live, or current data, not an earlier – and possibly outdated – copy. This means that you can implement a virtual data mart very quickly and adapt it to fit your evolving needs, all without the typical bureaucratic process.

Web Services

Arbutus LegacyLink™ is Arbutus’ proprietary driver that provides SQL access to virtually any mainframe (zSeries, MVS) or AS/400 (iSeries) data source.
Supported mainframe data sources include:
  • VSAM
  • QSAM
  • DB2
  • IMS
  • ADABAS
  • PDS
  • GDG
All major mainframe data types are directly supported, and automatically converted for use by your web application.
These include dates formatted in any manner, including YY, YYYY, Julian, and serial formats, as well as:
  • EBCDIC
  • Packed
  • Zoned
  • Binary
  • Unsigned Packed
  • IBM Floating Point
The ConnectPlus driver is Windows-based, and is compatible with:
  • ODBC
  • JDBC
  • OLE/DB
  • Any other ODBC-based protocol

Setup takes mere hours on your mainframe, so you can immediately start to see results. Once defined, you access your mainframe data directly, just as if it was stored in a data warehouse. If you prefer, you can easily stage your data on a Windows platform with a simple click-and-drag. There is no need to flatten your data first, as LegacyLink™ is compatible with virtually all data and file types.

Data-Driven Methodology Using Real Source Data

In many legacy data migration projects, the design and development stage can rely heavily on a review of source documentation. When these materials are absent, out-of-date or wrong, this stage represents one of the most significant risk areas.

That is precisely why Arbutus Migrate uses a data-driven methodology, using the real source data, rather than contrived test data. Migrate queries the source data directly, with no ETL programming required.

Advantages of Migrate’s legacy data migration:

  • Uses real source data early in the process, revealing issues that must be addressed up front
  • Helps you estimate the scope of the project more accurately
  • Minimizes your reliance on reviews of potentially inaccurate source documentation
  • Ensures important business and technical issues are not overlooked
  • Allows you to derive most of your meta data directly from the source data
  • Supports agile methodologies and an iterative approach to legacy data migration, with continuous fine-tuning, until all issues are thoroughly addressed
  • Increases reliability of tests more than if the process was undertaken using test data
  • Allows users to quantify and identify data quality issues at the outset

green arrow Visit: Arbutus Migrate product page

Migrate

Legacy data migration is a challenging exercise, often resulting in unexpectedly high outlays for acquisition, support infrastructure, and qualified personnel.

Arbutus Migrate uses a data-driven methodology that reduces both the costs and risks associated with the design and development of legacy data migration projects. Migrate queries the source data directly, with no ETL programming required. Its read-only approach has two main benefits:

  1. Migrate uses real source data, rather than test data, allowing users to identify and quantify data quality issues at the outset
  2. Migrate cannot alter the source data, so the integrity of the underlying data is maintained

The technology behind Arbutus Migrate has been used for over 20 years by medium and large organizations to access and convert all types of complex legacy data sources. These purpose-built capabilities allow you to convert source data to the target system’s requirements efficiently and economically.

Key Capabilities of Arbutus Migrate

Usability

  • All-in-one application means minimal training and start-up time – Just one tool to learn and use for all data validation, cleansing, and conversion tasks
  • Not a legacy data migration expert? Not a problem! – Data conversion professionals at Arbutus are available to assist whenever needed
  • Flexibility to use other migration tools if requiredArbutus LegacyLinktechnology can seamlessly supply data to other applications
  • Will not affect performance of production systems – Data can be moved between different server platforms for quick and easy staging
  • Access historical data from decommissioned platforms – Fully compatible with all data set structures and data types encountered on your mainframe, Unisys, HP, or DEC platforms

Purpose-built functionality

  • Intuitive user interface reduces programming requirements – Over 80 built-in functions help extract and transform even the most complex data sources, including mainframe legacy data
  • Easily validate source data – Migrate offers extensive interactive testing and validation tools
  • Comprehensive audit trail – The entire data migration process is automatically documented

Cost-Effective

  • Cost-effective – With Migrate, there is no need to use an expensive ETL tool
  • Data-driven methodology – Migrate minimizes staffing requirements and allows you to complete projects without the usual labor-intensive IT skills
  • Achieve results quickly – Arbutus Migrate sets up in just days

 

diagram migrate

Benefits of Staging the Data

When mainframe resource constraints make direct access solutions impractical, or if Arbutus does not offer a native server for your source platform, Arbutus also enables you to easily offload your processing to our open server on the Windows platform.

Having a central staging platform is also helpful when the project needs to bring disparate data sources together. Data from different platforms can be efficiently brought together for testing, validation, and conversion.

Unlike typical open server implementations, Arbutus’ open Windows server is fully compatible with all data set structures and data types encountered on your mainframe or AS/400. This means that the process of staging data is as simple as a file transfer, with no data transformation required. This movement of data can occur within the Arbutus environment or using alternative processes.

Arbutus Data Migration Services

Our consultants have a great many years of experience in legacy data migration projects involving mainframes and other systems, and are available to assist at any stage of your project. When needed, our consultants typically work with your team and provide assistance in the use of “best practices” with Arbutus technology.

When our consultants become involved, your project timelines become our project timelines, to ensure an on-time and successful completion of your legacy data migration project.

Instant Warehouse

Arbutus Instant Warehouse provides a proven solution for two difficult data problems:

Instant Warehouse Data Warehouse Prototyping

Getting user participation and involvement is one of the key factors in a successful data warehouse project.

Instant Warehouse is a prototyping tool that provides you with powerful ways to visualize and analyze your data, while involving end-users at the earliest stages. Users are involved because you can prototype your application using the entire set of real data.

The Instant Warehouse agile modeling techniques reduce the need for extensive advance documentation reviews and dramatically reduce both the time and effort required to plan the project’s data access. In addition, these tools enable you to minimize up-front costs and reduce risks by limiting your exposure related to unexpected data problems or new requirements.

Instant Warehouse - diagram

Benefits

  • Save time and money
  • Minimize documentation reviews at the start
  • Prototype the data access for your project and get it right the first time
  • Evolve the prototype to become the requirements document
  • Minimize or eliminate subsequent re-work
  • Start seeing results in days

Reduce risk

  • Ensure the project meets your needs, before it’s built
  • Involve users at the earliest stages
  • Gain acceptance before you implement

Know your source data

  • Process your source data directly, and identify problems at the earliest stages
  • Apply advanced data quality tools and techniques directly to the source data
  • Pre-test your conversion, mapping, and cleansing rules directly against the source data

 

Instant Warehouse Legacy Data Store

Many organizations continue to make their IT environments more efficient and cost-effective by decommissioning or consolidating their hardware platforms. However, one of the challenging outcomes of this process is the storage and use of historical data. Much of this data cannot be cost-justified to migrate into a data warehouse, but is still needed for periodic and important use. Instant Warehouse is designed to provide a very cost-effective solution to this problem.

The Arbutus Windows Server is fully compatible with all data set structures and data types encountered on your mainframe, Unisys, HP, or DEC platform. This means that the process of staging data is as simple as a file transfer, with no data transformation required.

The result is that you can implement a data mart-style solution, with full access for any type of query, analysis, or reporting needs, in a matter of days.

Overview:

  • Continue to use historical data from decommissioned platforms like IBM mainframes, Unisys, HP, and DEC
  • Unconverted legacy data can be stored and utilized on any Windows Server
  • One-time set-up of Legacy Data Store for use by end-user applications can be done within hours or days
  • Eliminate the time and expense of moving all your legacy data into a data warehouse
  • Historical data and active data can be easily combined into one data source
  • Simplifies the re-purposing of historical data
  • No training needed for end-users to access the legacy data

Download: Arbutus Instant Warehouse brochure Acrobat PDF

Visit: Learn more about Arbutus Instant Warehouse

Visit: Instant Warehouse for Business Intelligence

 

Connect

ODBC drivers for relational and non-relational data sources

Arbutus Connect ODBC Drivers offer direct, real-time access to all your corporate data sources, including mainframe sources such as: Adabas, DB2, IMS, ISAM, QSAM, VSAM, and other complex files (i.e., virtually any data file in table format). Connect Drivers can also access your SAP data, Internet-based sources, as well as data stored in other environments, such as Windows, so that all of your data is presented in a unified environment.

Connect ODBC Drivers provide your Windows, web, and server applications with instant ODBC connectivity to all of your relational and non-relational data sources, significantly lowering the cost of fully utilizing your legacy system data.

A Simple Solution for Complex IT Environments

You no longer need to create a data warehouse, data mart, or other repository to consolidate data for end-use reporting and analysis. With a short implementation timeframe, reporting, and analysis ideas can go from the whiteboard to the boardroom in just days.

  • Direct Access – If required, Arbutus Connect ODBC Drivers can provide direct access to “live” data residing on multiple platforms via ODBC, so users experience zero data latency when querying data.
  • Staged Data – Staging your data on our Windows Server is as easy as a few clicks, with no ETL programming required. This helps avoid the expensive loading and set-up process, while providing an integrated view of all your data, with fast, efficient delivery to end-users.
  • Secure, Read-Only Access – While adhering to strong IT data management controls, all Arbutus technology, including Connect, provides read-only access to data.
  • Robust Security – Connect accesses corporate data using operating system and database authentication protocols only.
  • Compatibility – Connect is fully compatible with all common data access standards, including ODBC, JDBC, OLE DB, and ADO.NET.

 

Connect ODBC Drivers