raw2a

Product Overview


January 2004

During the course of its journey from network element to billing system, a network data record may traverse several subsystems, undergoing many transformations along the way. When things go wrong, it's often hard to identify the culprit. Each subsystem involved may have its own proprietary utilities for examining the records it handles, but this makes it difficult for the troubleshooter, especially when the record format is changed due to a subsystem software upgrade. In a time when individual file sizes may reach half a gigabyte, and hundreds of millions of usage and other records are collected in a single day, it's like trying to find a needle in a haystack. There is typically no single utility provided by a given subsystem vendor that can analyze records regardless of source, and regardless of format, let alone parse all types of files, including those containing usage data, alarms, traffic, or even non-network data.

raw2a - A Format-Independent Parser
raw2a is just such a utility. raw2a makes it easy to find and parse records in files, regardless of the format of the record or the file. The data in the file can be of any type, including all types of network and non-network binary and text data. Currently raw2a is used primarily to analyze telecommunications usage records in files produced by a switch or other network element, by a mediation system, or by a billing system or other downstream application. raw2a converts the binary format of most such records into human-readable form. As a Unix utility, raw2a is capable of searching any file that can reside in a Unix filesystem, and of outputting all or a subset of records and fields. Because of its unique and revolutionary pattern matching mechanism and table-driven record parsing, raw2a can be quickly reconfigured to respond to changes in record or file formats. raw2a finds the needle.



Decades of Experience
raw2a was written by an expert in telecommunications network management, with over 30 years of experience writing software for network elements and mediation systems. The product has been in continuous development for over 10 years, and has gone through several major revisions. The goal has always been to reduce the amount of time necessary to respond to changes in the network, and to increase the flexibility of the program. The result is a mature, extremely powerful utility that includes absolutely no network-element-specific code, and of which it can reliably be said that there is no format which it cannot parse.

Among the things that raw2a can do are: Finding the Needle
Perhaps the most important function of raw2a is searching for records among the mountains of data generated by the network. Troubleshooting network problems often involves finding specific CDRs. For example, many groups within a service provider use test calls to verify proper operation of the network, the mediation system, and downstream applications. The revenue assurance group may periodically make a group of test calls to verify that the billing system receives all records and bills for them properly. But what happens when the calls don't appear on the bill? Where in the chain of subsystems has the record gone astray? Using raw2a, the troubleshooter can quickly locate the record in question, and examine it for improper formatting, unexpected field values, etc.
Requests from law enforcement agencies is just one of many other reasons to search for specific records, involving selection by calling and called number, bounded by a certain time period, etc.
To search for records, one could use the output of a generic parsing program piped into a Unix utility like grep to isolate the records of choice. But this is of course grossly inefficient, given the volume of data involved. By contrast, raw2a provides robust built-in facilities for finding the records that are needed, and only those records. Command-line switches allow the user to specify that only records whose fields match certain values be output. The user can specify a starting and ending record number, or specify a starting record to be output followed by only the next n records. For each record, the user can specify just which fields are to be output. And switches can of course be combined to further isolate the data of interest.

When Something Goes Wrong
Occasionally a data file is damaged, due to a misapplied software update, an improperly defined parameter, or the like. Subsystems which process the bad data may fail in unpredictable ways. Typically it is not just a single record that is damaged, but entire blocks of data. Using raw2a in combination with a file editing utility like rawpkg (available from Dugger & Associates), the nature of the damage can be found and usually corrected with minimal loss of data.
raw2a helps the user deal with damaged files in several ways. First, its robust parsing mechanism allows useful information in the file to be examined even if it does not appear at its usual position. Second, when raw2a finds data that does not fit into any record, it reports the data as a "gap". This localizes the damage to specific areas in the file. Suspecting damage, the user can easily examine the output of raw2a for evidence of unusual gaps. Third, raw2a can be directed to output to a new file all found records in the original binary format.
One of the most important indicators of a damaged file involves sequence numbers. The first clue that there is a problem is usually an alarm from the mediation system that sequence numbers are duplicated or missing. raw2a provides extensive tools for reporting missing or duplicate sequence numbers in a single file or across a group of files. As it parses a file, raw2a records evidence of sequence number issues to an "unresolved blocks" file. For each new data file parsed, raw2a will update the unresolved blocks file to reflect newly-found data. The user controls the size of the unresolved blocks file by specifying how quickly records are aged out of the file, or by directly specifying a maximum size for the file.

Discovering the Truth
Another kind of "damage" occurs when the format of the file or record changes unexpectedly. Perhaps a network element software load has been upgraded with insufficient notice, or a technician has modified a parameter that causes the structure of the file to change. Again the robust parsing mechanism of raw2a comes to the rescue, because it allows the user to extract all useful data from the file, and to determine from reports of "gaps" of invalid data just where in the file or record the format has changed.
A related use of raw2a whose importance should not be underestimated is in the exploration, understanding, and discovery of record formats. Network element vendor documentation of formats, and of changes to those formats, is often confusing, inaccurate, incomplete-or even missing entirely. raw2a can be used as an educational tool. What are the possible values of a field? Given a certain value in the field, what other subrecords or fields can appear, and what are their values? How many records can appear in a file? Exploring records in this way is especially important to understand and verify changes to a record format. This can help minimize delays in receiving updates from subsystem vendors.

Speeding Response to Change
The key to the operation of raw2a is the Record Definition File (RDF). All parsing for a specific file and record type is driven from this simple text file. The RDF allows the user to define the file format, the record and subrecord types found in the file, the fields found in each record, and the possible values of each field. All components are given names, which raw2a uses when printing the records. Multiple data formats are supported, including true binary, BCD, EBCDIC, XML, etc.
Because all necessary parsing instructions are given in this separate text file, the vast majority of raw file and record format changes introduced by a subsystem vendor require no change to raw2a itself. Editing the RDF with a simple text editor is all that is required.
A key technology exclusive to raw2a is called "patterns", which is a form of binary regular expression. Patterns that appear in the RDF define a unique "signature" for each structure in the file, whether it be a file or block header, or a data record. As raw2a traverses the file, it attempts to match each possible pattern against subsequent data. If it finds a match, it proceeds to parse the fields for the record. Otherwise, it shifts position one byte further into the file, and again attempts to pattern match.
The use of patterns provides two major benefits. First, it allows for the parsing of all possible file formats, with all possible record formats. Secondly, it allows raw2a to find records in the file even if the file format has been changed, or if the file has been damaged. In doing so, raw2a reports the changes that it finds, so that the user can modify the RDF or repair the file, as appropriate.
More than one RDF can be created for a given network element's data. For example, an abbreviated RDF containing only enough parsing instructions to determine the sequence number of each record can be used to speed processing when the user is looking for sequence number gaps or duplicates, and is not interested in the details of each record.
raw2a is supplied with a library of RDFs, including support for the following network elements. New RDFs are easily written.
Saving Time and Money
The fruit of decades of experience, utilizing unique parsing and searching technology, handling high volumes of data of all types, raw2a greatly speeds the identification and correction of network problems.


© Copyright 2004 Dugger & Associates, Inc.