Contact us
Call us if you want to find out how we can help your organization :
+7 495 648 69 18 (Moscow) Russian speakers
Weekdays, 9AM - 6PM, GMT + 3
+420 773 052 550 (Prague) English speakers
Weekdays, 10AM - 7PM, GTM + 1

Data Quality

One of the main problems companies face during analytical systems development is inadequate attention to data quality. Systems based on incomplete, incorrect or contradictory data cannot be used to solve actual management tasks and take decisions. Automatic search, mistakes correction and deduplication of consolidated data (due to manual input) are the most complicated technical problems that can be successfully solved by means of advanced analytics.

Faunus Analytics provides a full range of services in the field of data quality systems development. Our major expertise is concentrated on the basis of the world leading solution - DataFlux (included into SAS Institute).

The standard data quality management cycle includes the following phases:

  1. Data quality audit (profiling). Clearing up the situation with the current data quality at the organization. Further work is based on the results of this phase.
  2. Parsing. Data element parsing to components (e.g., a ZIP code, settlement, street, house number, etc. can be selected in a postal address).
  3. Standardization. Parsed data parts are adjusted to one unified format.
  4. Clustering. In fact, different records related to one object can be combined in groups. Special methods allow to recognize and connect data fragments, often implicitly relating to one object (e.g., personal data of one person in different databases of an on-line social network, restaurant network, agencies selling air tickets, etc.).
  5. Surviving Record Identification. For example, there are three records taken from different data sources but relating to one object – a person. Some records from different databases may lack certain parts of required information (e.g., first and last name, address or phone number), may include contradictory data (e.g., different records may contain different location addresses). Combined data analysis helps to create one «correct» record containing complete and actual information abut the object. This process is also known as deduplication.
  6. Enrichment. Initial data can be complemented by data from external sources (Credit Bureau, Address database etc).
  7. DQ processes monitoring. Creation and setting up of business rules allows to monitor DQ processes and determine their impact on the overall efficiency of the enterprise activity.

Faunus Analytics specialists also have a unique expertise in solving DQ nonstandard tasks:

  • Text recovery in nonstandard transliteration
  • Data grouping by fuzzy criteria
  • Implementation of nonstandard text data parsing procedures
  • Creation of specialized knowledge bases (including industry specific)
  • Data checking with respect to compliance with special requirements (e.g., Basel II for financial institutions)
  • Improvement of master data management (MDM) processes efficiency

All trademarks of the above mentioned technologies and products belong to the corresponding companies or creators.

See also