What is a Modern Data Platform

A Modern Data Platform combines the functionality of a data lake, data warehouse and agile data mart with the ability to scale with out a large upfront investment.

A Modern Data Platform should enable the following:

  • Enable self service to a diverse range and users from data scientists to business users
  • Automate the ingestion and cataloging of information from multiple sources
  • Enable agile data marts and data management
  • Trusted controlled source of information for major decision making
  • Enable 360 degree feedback to operational processes
  • Provide choice in front end analytics and data discovery tools.

A modern data platform enables controlled access to your data to a wide range of users in a format they are able to use, this could be:

  • Executives and senior management via data visitations in story board dashboards with commentary and collaboration
  • Business Analysts
  • Data Scientists
  • Operational staff with live feed back and recommendation
  • Automated updated to operation processes

Each of these user groups needs access to data with different levels of enhancement, aggregation and freedom. The data can originate multiple locations

  • ERP system like SAP…..
  • Sensor data
  • Web site click stream
  • Customer interactions in a CRM or via email and messaging
  • Images

Historically organisations try to solve these challenge by building enterprise data warehouses. These became very expensive and slow to respond to changing business requirements. Then came data lakes that turned into data swamps. If you dump all of the raw tables from your enterprise system into a data lake and give everyone self serve reporting tools, only a very small number of expert users will be able to accurately use the data.

From an architectural perspective a Modern Data Platform should have the following building blocks:

  • Be based on serverless cloud technology that enable different layers of storage (at different cost points) and the separation of storage and compute
  • Be able to store high volumes of structured and unstructured data at a low cost
  • Be able to store transformed and enhanced data in a high performance modern data warehouse with pay per use storage and compute
  • Be able to combine data from data lake, data warehouse and real time API calls
  • Catalog and classify all data in the platform
  • Enable a combination of data ingestion and transformation approaches including: batch, real time streaming and real time API calls

From a governance control and quality perspective;

Having all of you data easily available to your employees might sound great, but if it gets into the wrong hands it can cause major problems. Especially if it is customer data, PID or data that gives you a competitive advantage.

The data platform needs to classify data for quality and have controls to continuously ensure the quality of data at each level. Security classifications and tags are equally important. The majority of users should only have access to data that has been classified and tagged for both quality and security levels.

For more information contact: info@citras.io