Data Engineer and Analyst with experience in machine learning, mobile development, SQLite databases, data warehouses (Amazon RedShift), ETL Pipelines, Amazon Web Services and third party open source ports/integrations to the mobile software domain. Interested in staying on the cutting edge of computer science research and continual learning.
YOW! Data 2016 Sydney
Unit Testing Data
“Can I trust this data?” When asked this question it can be a difficult task to objectively measure and answer. Similar to how unit tests have provided metrics for code coverage and bug regressions, this talk aims to show techniques and recipes developed to quantify data sanitisation and coverage. It also demonstrates an extensible design pattern that allows further tests to be developed.
If you can write a query, you can write data unit tests. These strategies have been implemented at Invoice2go in their ETL pipeline for the last 2 years to detect data regressions in their Amazon Redshift data warehouse.