Data Scientist at Data61
Natalia is a data scientist in the data platforms group at Data61, CSIRO. She is passionate about social network analysis, web mining and machine learning with specialization on data mining and link prediction. Her experience includes working on data intensive projects in Ukraine, Austria, Japan and Australia.
YOW! Data 2016 Sydney
Automating Data Integration with Machine Learning
The world of data is a messy and unstructured place, making it difficult to gain value from data. Things get worse when the data resides in different sources or systems. Before we can perform any analytics in such a case, we need to combine the sources and build a unified view of the data. To handle this situation, a data scientist would typically go through each data source, identify which data is of interest, and define transformations and mappings which unify these data with other sources. This process usually includes writing lots of scripts with potentially overlapping code – a real headache in the everyday life of a data scientist! In this talk we will discuss how machine learning techniques and semantic modelling can be applied to automate the data integration process.