pexels-christina-morillo-1181341-1030x688 Essentials of Data Governance

In the era of emerging technologies, data has become essential for organizations. With rapid digital transformation across industries, gaining a competitive advantage is crucial for thriving in the market. Today, data is the new “oil” that forms an organization’s core for business growth. However, the rate of data generation has become enormous. A recent report by Towards Data Science produced the statistics of data generation that stands at a whopping  2.5 quintillion bytes. Additionally, the current projections state the data generation rate to rise to 133 zettabytes by 2025.

In recent years, the increase in the number of data breach cases has doubled. The imminent threat in a business is the possibility of data breaches. To bolster data protection, it is of utmost importance to have a robust data governance framework. As per IBM data breach reports, the average cost of a data breach is highlighted as $3.86 million, while the USA alone recorded a breach of $8.64 million.

There is a need for robust data governance framework to tackle such challenges. Standard data governance ensures data security, data quality, and integrity while providing the traceability of the data origins. Also, data governance can be successfully implemented when high-quality data is readily available with crucial information on the data types, which is achievable with a data catalog.  Besides, an organization attains firm control over its data usage policies when a regulatory body imposes stricter guidelines. Today, it is possible with some of the robust regulatory bodies available that put a strong emphasis on data governance. Among them, the most well-known is the General Data Protection Regulation (GDPR). Furthermore, a data governance approach can reach its ultimate goal within an enterprise with its essential components, namely processes, policies, access controls, and data protection, encompassing the entire data-related workflow within an organization. Tech giants such as Microsoft have contributed significantly to the data governance requirements with the Azure Purview offering that has reach achieved wide acceptance in the industry.

The article delves into the topic to provide a deep insight into data governance and its regulations.

Data Governance Overview

Data governance is a strategy that incorporates the practices, processes, and technical requirements of an organization into a framework by which an organization can achieve standardization in its workflow, thereby providing protection and the appropriate management of its data assets. A useful data governance model’s scalability is a must as it ensures that all the policies, processes, and use-cases are applied accurately for transforming a business into a data-driven enterprise.

Another crucial aspect of data governance is for an organization to conduct a risk assessment and compliance. The successful integration of data governance is determined by efficient data management and data security factors within the framework. An ideal governance policy must address the critical components of data storage,  the original source, and a well-defined data access strategy. Furthermore, data governance solutions focus on providing response plans relating to misuse of data and unauthorized access.

Data governance and data management are often used synonymously, but it is essential to understand that data governance forms a significant part of a data management model.

Data Catalog

A data catalog acts as the inventory of the critical data assets in an organization. The use of metadata helps to manage the data more efficiently. The data professionals benefit from a data catalog as it helps in data collection, organizing data, easier accessibility to data, and improvement of the metadata to support data discovery and governance. While the data generated is enormous in a day to day functioning of an organization, finding relevant data becomes challenging for specific tasks. Additionally, data accessibility is demanding due to various legal regulations of the organization and a particular country’s government. The key factors to understand are the data movement within an organization, such as the individuals who will have access to it and the purpose they want to access it. Such tracking of the data ensures the protection of the data as it limits unauthorized personnel. Thus a data catalog plays a crucial role in addressing some of the challenges related to data.

  • A data catalog provides all the essential data required by an organization; therefore, data accessibility from a single point ensures reduced time for searching data.
  • Creating a business vocabulary.
  • Efficient transformation of data lakes into data swamps.
  • Identifying the different structures of the data.
  • Availability of high-quality and reliable data.
  • Data reusability possibilities

An organization can achieve a competitive advantage with the appropriate use of data. Therefore the data should be trustworthy from the appropriate sources. Some of the organizations’ key members, such as C-level executives, use data for business decisions. Thus, a data catalog becomes useful for looking at cost-saving and operational efficiency factors with a keen eye on fraud and risk analysis.

Data Governance Framework

A data governance framework allows an organization to focus on achieving the business goals and data management challenges while providing the right means to attain them more speedily and securely. Besides, the results of a data governance integration are scalable and measurable.Key-Participants-in-a-Data-Governance-Framework Essentials of Data Governance

Figure. Key Participants in a Data Governance Framework. Source

 

Some of the essentials of a data governance framework are:

  • Use Cases

The data governance framework must address some critical factors, such as the use case for several business scenarios in an organization. The data governance use cases should interlink the need for a data governance framework and its contribution to achieving business goals. Ideally, the use cases are derived from significant factors in an organization, such as revenue, cost, and the associated risks. The category-related use case addresses the enrichment of products and services, innovations, market opportunities, and the ability to achieve them at a reduced cost of maintenance with efficiency, auditing, and data protection.

  • Quantification

The need to quantify data is an absolute necessity as it produces data governance integration in the organization. A business needs to ascertain that they are following, covering all the categorized use cases with evidence to monitor the performance and provide future insights.

  • Technical Benefits

With the technical addition in a workflow, the data governance solutions can efficiently address some of the critical components, thereby ensuring efficiency. The data governance must address factors like the need for technology investment and the primary members who will work with data-related processes. A technical infusion in the workflow also enables the easier discoverability of data definitions, data categories, data lineage, and the appropriate classification of data as trustable data or untrustworthy data. The technical addition also makes it possible to create a feedback mechanism for resolving regulatory issues and policies concerning data usage.

  • Scalability

The data governance policies should be capable of providing scalable results. Using a scalable model provides growth opportunities for an organization by addressing the problems in a data lifecycle. The primary focus is to introduce new tools to reduce operational costs and provide data protection for business growth.

Data Governance Processes

The data government processes comprise of the following.

  • The organization must be mindful of the essential documents such as regulatory guidelines, statutes, company policies, and strategies.
  • A clearly defined workflow states legal mandates, policies, and objectives to be synchronized to help an organization meet data governance and management compliance.
  • Data metrics to be incorporated to measure the performance and the quality of the data.
  • Principles of data governance to be met.
  • Identification of the data security and privacy threats.
  • Control measures to ensure smoother data flow with a precise analysis of the risks.

Data Governance Policies

Under data governance, there are various policies to determine the effectiveness of the organization’s operational strategies. Some of the policies related to data accessibility, data usage, and data integrity are incredibly crucial for successful data governance implementation. The most important policies that an organization must follow for successful data management are as follows.

  • Data Structure policy
  • Data Access Policy
  • Data Usage Policy
  • Data Integration Policy

 Privacy and Compliance Requisites

The organizations are associated with a significant amount of highly sensitive data. Therefore, an organization needs to follow the regulatory compliance of data governance. In the context of business, privacy refers to an individuals’ right to have control over the type of personal data they want to be collected and used and the sensitive information that should be restricted. As per EU directives for data governance, sensitive data is defined as the data that contains a name, address, telephone number, and email address of an individual. On the other hand, sensitive personal data is distinguished clearly, as the data contains information on a person’s ethnicity, political opinion, religion, race, health-based information, criminal conviction, and trade union-based membership details. Such data have stricter guidelines that must be followed with due diligence.

Role of General Data Protection Regulation (GDPR)

The General Data Protection Regulation (GDPR)  was established in the year 2016. The primary aim of the regulation was to provide a framework for data privacy standards. GDPR states that any company looking to conduct business in Europe must be willing to adhere to data protection norms. The GDPR has strict guidelines that ensure the protection and privacy of personal data for its citizens. The mandate was an update from the previous Data Protection Directive in Europe.

Crucial-Requirements-of-GDPR- Essentials of Data Governance

Figure. Crucial Requirements of GDPR. Source

 

Under GDPR, the mandate’s scope extends its reach in terms of the territorial horizon while providing a well-defined law for processing personal data by offering their business services in Europe. The organizations or individuals aiming to provide their services without the presence in Europe are monitored for their service offering under GDPR guidelines. The tracking of such services includes online businesses that require users to accept cookies to access their services. GDPR also differentiates the various data types and the data considered personal data under the mandate.

Furthermore, the direct and indirect data are interlinked with the identification of data subjects. The data subjects are people who can be identified with their information presented in the data. The data in this context is related to personal information such as names, addresses, IP addresses, biometric data logs, citizenship-based identification, email, and the profession.

Additionally, the GPPR mandate ensures that the data is collected within the limits of the law, and it should be highly secured while it exists the records of the organization with stricter rules for its uses. The primary categories of GDPR data governance requirements are:

  • There must be a classification of personal data, while personal identification data must have limited usability. The individuals can access their data and hold the right to request personal data removal or rectification. The mandate also states mandatory data processing requirements and portability of data.
  • Data protection is a must, and it should cover all aspects of safeguarding personal data collected. Also, there must be confidentiality, integrity, and availability of the data collected for business purposes. The organizations should also adhere to data restoration regulations for scenarios that may involve data loss due to technical failure or accidents.
  • The collected data must be well- documented as per legal procedures.

Access Controls

Access controls form an integral part of access governance that regulates the accessibility of data. The critical areas covered comprise the guidelines to specify who can access the data and view it. Additionally, it specifies that there is a requirement to state the purpose of data access in the organization. The compliance of access controls allows eliminating unauthorized access of data.

As per the GDPR mandate, some of the data protection requirements must enforce specific procedures.

  • There must be accountability associated with data protection requirements. Data protection personnel must be appointed to manage data and monitor its activities for organizations involved in data processing activities. The appointed individuals must ensure that the data protection standards are met.
  • Data storage is the essential factor for data privacy. Therefore, organizations must have a data map and data inventory to track the source of data and its storage. The source includes the system from which it was generated while tracking the data lineage to provide comprehensive data protection.
  • Data accuracy is paramount, and organizations must keep up-to-date data to achieve high-quality data. Also, data quality reporting must be followed to keep up with data quality standards.

Data Protection

  • Data intelligence provisions for getting insights with 360 visibility of data.
  • Identifying data remedies for security and privacy issues.
  • To protect sensitive data with access governance and ensure no overexposed data exists with data governance methods.
  • Integrating artificial intelligence capabilities to identify dark data and its relationship.
  • Assigning labels with automation to provide data protection during the workflow and the lifecycle of the data.
  • Rapid data breach notification and its investigation.
  • Automate procedure for classifying sensitive and personal data.
  • Automated compliance and policy checks.
  • In-depth assessment of risk scores with metrics depending on the data type, location, and access consent.

Reimagining Data Governance with Microsoft Azure Purview

Azure Purview is a unified data governance service by Microsoft. The governance service enables management and governing of on-premise, multi-cloud, and software-as-a-service (SaaS) data. The users can have access to a holistic and up-to-date map of the data with automated data discovery. Besides, the classification of sensitive data is more manageable along with end-to-end data lineage. With Azure Purview, the data consumers are assured of valuable and trustworthy data.  Some of the key features of Azure Purview are discussed in the following section.

  • Unified mapping of data

The Purview data map feature establishes the foundation of practical data usage while following the data governance standards. With Purview, it is possible to automate the management of metadata from hybrid sources. The consumer can take advantage of data classification with built-in classifiers that can Microsoft Protection sensitivity labels. Finally, all the data can be easily integrated using Apache Atlas API.

unified-data-mapping Essentials of Data Governance

Figure. Unified Data Mapping using Azure Purview. Source

 

  • Trusted Data

Purview offers a data catalog feature that can allow the easier search of data using technical terms from the data vocabulary. The data can be easily identified as per the sensitivity level of the data.

  • Business Insights

The data supply chain can be interpreted conveniently from raw data to gain business insights. Purview offers the option to scan the power BI environment and the analytical workspace automatically. Besides, all the assets can be discovered with their lineage to the Purview data map.

  • Maximizing Business Value

The SQL server data is more discoverable with a unified data governance service. It is possible to connect the SQL server with a Purview data map to achieve automated scanning and data classification.

  • Purview Data Catalog

The Purview data catalog provides importing the existing data dictionaries, providing a business-grade glossary of terms that makes data discoverable more efficiently.

Conclusion

Business enterprises are generating a staggering amount of data daily. The appropriate use of data can be an asset for gaining business value in an organization. Therefore, organizations need to obtain reliable data that can provide meaningful business insights. Advanced technologies such as artificial intelligence and data analytics provide an effective way of integrating data governance in the operational workflow. Today, tech giants like Microsoft, with their data governance offering: Azure Purview, have paved the way for other organizations to opt for data governance. Many startups follow in the footsteps and have acknowledged the importance of data governance for high-quality data while ensuring data privacy at all times, thereby offering several data governance solutions in the market. A robust data governance framework is essential for maintaining the data integrity of the business and its customers.

 

 

The Roadblock for Digital Transformation

Synapse-1-211x300 Integrate Data Silos with Azure Synapse Analytics

Source: Harvard Business Review

It is clearly established that Digital Transformation is the key to success and even survival for organizations, even more so with the current global crisis due to COVID-19. 64% of executives believe that they have less than four years to complete digital transformation or they will go out of business. 91% of global executives surveyed by Harvard Business Review feel that effective data and analytics strategies are essential for digital transformation. This data driven culture is critical to spark innovation and drive efficiencies, which is crucial for survival.

But, 80% of the respondents also say that their organizations are struggling to become mature users of data and analytics even though 79% of the employees use data and analytics at least once a week. What gets in the way of organizations effectively using data and analytics for business decisions?

More than half (55%) of the executives say the key roadblock stems from data silos and difficulty managing data coming from multiple systems. Digital transformation leads to a lot of data being captured across various systems which can be extremely valuable. However, less than 20% of this data can ever be analyzed due to the silos. This is mainly because of the disconnect between Big Data analytics, enterprise Data Warehousing, Analytics, and Artificial Intelligence/Machine Learning.

Simplifying Analytics

The need of the hour is to simplify analytics in a manner that breaks down these silos and makes the most of the data available for analysis without having to jump through hoops. In an ideal world, streaming operational data should be available for immediate analysis to generate reports and run models on the data. This is not a trivial problem.

Operational data is a mix of structured and unstructured data which is generally stored in a Data Lake, not suitable for analytics. Hence, the operational data needs to be imported into a Data Warehouse. The reporting and analytics services can then run on the Data Warehouse.

This creates three key issues. 

  1. Lag between the operational and analytics data stores due to the ELT pipeline. 
  2. Balancing the operational, ELT, reporting, and analytics workloads in the cloud. 
  3. Efficient and effective model management.

Organizations would really benefit from a framework which effectively addresses these issues and removes the roadblocks to data maturity. Azure Synapse Analytics is a step in the right direction with a big promise – Limitless Analytics Services in the cloud.

Azure Synapse Analytics to the Rescue

synapse-2-300x198 Integrate Data Silos with Azure Synapse Analytics

Source: Microsoft

Microsoft has launched Azure Synapse Analytics to fulfill the promise of limitless analytics services. This service creates a single place to collaborate for Data Engineers, Database Administrators, Data Scientists, Business Intelligence Analysts, and Business Users with everyone accessing the same data.

The service offers a distributed query processing engine, versatile form factor for computing (cluster/ serverless), and a single experience for the users to manage the end-to-end process. This provides the much required flexibility in scaling and a great user experience, which promotes collaboration.

Many features of Azure Synapse Analytics are now generally available with many more in the pipeline. We believe that this service will evolve rapidly into the standard for analytics at scale for organizations.

Benefits of Azure Synapse Analytics

Azure Synapse Analytics allows teams to seamlessly work together. However, the benefits go beyond this. Some additional benefits are:

1. Unified Experience

 

synapse-3-300x117 Integrate Data Silos with Azure Synapse Analytics

Source: Microsoft

Azure Synapse Analytics allows users to ingest, prepare, manage, serve, visualize, and analyse the data using a unified experience. Users can bring their analytics to where the data is located, rather than switching to a different interface. This gives a big boost to productivity.

synapse-4-300x169 Integrate Data Silos with Azure Synapse Analytics

2. Limitless Scale

Azure Synapse Analytics enables limitless scaling for data and analytics in the cloud. Data professionals can derive insights from all the data across data warehouses and big analytics systems at speed. They can query both relational and non-relational data at petabyte-scale using T-SQL language. Furthermore, they can benefit from a versatile form factor of using clusters and serverless computing. Finally, they can run analytics systems along with mission critical workloads with intelligent workload management, workload isolation, and limitless concurrency.

Synapse-5-300x169 Integrate Data Silos with Azure Synapse Analytics

3. Integrate Business Intelligence and Machine Learning

Azure Synapse Analytics allows users to integrate Power BI and Azure Machine Learning within the Azure Synapse Studio. Then BI professionals and Data Scientists can tap into the available data immediately to create faster insights.

Synapse-6-300x169 Integrate Data Silos with Azure Synapse Analytics

4. Cloud-Native HTAP Implementation

The announcement of Azure Synapse Link (Preview) brings cloud-native hybrid transactional and analytical processing (HTAP) for Azure Cosmos DB. And with this, plans to expand it to other data stores in the future. It creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics. This enables users to run near real-time analytics over operational data which is stored in Azure Cosmos DB.

Synapse-7-300x126 Integrate Data Silos with Azure Synapse Analytics

Want to learn more? Click here for a quick and informative video that demonstrates the power of Synapse Analytics Link.

5. Price-Performance

Price-performance is also a critical part of data solutions. According to Microsoft, Azure Synapse Analytics offers better price-performance as compared to Google BigQuery and Amazon Redshift based on field tests done by GigaOm.

Synapse-8-300x221 Integrate Data Silos with Azure Synapse Analytics

Source: GigaOm Report

The TPC-H and TPC-DS results published by Microsoft show a significant reduction in price of Azure Synapse Analytics as compared to the others in the preceding as well as following graphics.

Synapse-9-300x294 Integrate Data Silos with Azure Synapse Analytics

Source: Microsoft

Speed

As demonstrated in this video from Ignite 2019, Azure Synapse Analytics can be blazing fast in a petabyte-scale environment combining relational and non-relational data. This can be a game changer for organizations where faster decision making can lead to substantial profit increase.

Synapse-10-300x168 Integrate Data Silos with Azure Synapse Analytics

Getting Started

Synapse-roadmap-300x166 Integrate Data Silos with Azure Synapse Analytics

We have multiple offers that make it easy for organizations to get started with Azure Synapse Analytics no matter what stage they are at in the process. 

  1. Just getting started?  We offer a free two-hour lunch and learn workshop to help you understand this service. 
  2. Do you already know about the service but need help figuring out your next step?  We can conduct an assessment, strategy, and roadmap workshop that will provide your organization a plan with how to move forward. 
  3. Do you have a roadmap but need help with implementation? We can get you started with the first pilot which can be completed in 2-4 weeks. Once you have experienced the value from the pilot, we can help you with the implementation as per the roadmap.

 

Contact us at info@optimusinfo.com to get started.

chart-close-up-data-desk-590022-e1583771218589-300x168 Starting a Data Project

It’s exciting to hear ‘Data is the new Oil’ or the ‘new Gold’ or the new ‘something valuable’. What I dread, though, is the day we hear ‘Data is the new fad and a complete waste of money’. I hope that day never comes!

A lot will depend on how businesses approach data projects. Right now, it could go either way. There are many organizations throwing money at data projects to ensure they are not left behind. There are many more who are not even getting started fearing the outcome or the futility of it. If you belong to either camp, I will share a simple process to maximize the return on your data projects.

Where Data Projects fail

Data projects are complex and resource intensive and hence have many failure points. Most failure points are like the failure points of any complex project. Data availability, data quality, team quality, team work, communication, and so on. There is one, though, which is unique to data projects and at the root of all failed projects. It’s what I call the ‘rabbit hole question’. If a data project begins with this, it is likely to fail.

The Rabbit Hole Question

This is the question I most often hear from companies wanting to start data projects. It is some variation of – ‘What can I do with my data?’. I agree that it is the most natural question to ask, however, not the question that is going to set you up for success. It is the dream question for the salesperson who can now engage the solution architects. Who will then build an exciting solution. A solution that is likely to cost a lot of money and take a lot of time. Worse, it may not yield any results. Why? because it’s the ‘rabbit hole question’.

This question propels everyone to start thinking about what all to do with the data. Or, where all to apply the algorithm or the tool. There are many possibilities and hence many potential projects. But there is no way to figure out what we will get at the  end of these projects. We will only discover it as we go along. And chances are we may not like what we see in the end, if we see anything at all.

Avoiding the Trap

So, how do we avoid the ‘rabbit hole question’? Where do we start and how do we proceed to maximize our chance of success? The answer is to flip the question – ask “What can my data do for me?”. Better still, use a top down approach of starting with your Business Objectives. The graphic on the right illustrates a more sensible approach to data projects.

info-03-e1583772519476-300x219 Starting a Data Project

The key is to break down the process into two phases – Planning and Execution. Planning requires little time but a lot of thinking but is crucial for success.

It is important during planning to stop thinking about the data you have and what to do it. Instead, start with what the key objectives for your business. Next is to think about the Actions required to achieve those objectives. That leads us to thinking about the kind of decisions we need to take. Then we can ask the question – “What insights do I need to take these decisions?”. These required insights then lead us to the relevant data and findings.

In this process, we may find that we do not have some of the required data. We can then start collecting those. In the meantime, we can then switch to execution with the data we already have. We can use the data and findings to generate relevant insights. These insights then drive the appropriate decisions. These decisions then guide us with the required actions to achieve our objectives.

Data Strategy Workshop

In our experience, the knowledge required for Planning is available in the organization. It usually sits in different silos though. Also, we find that the key stakeholders are usually not aligned.

Hence, we recommend conducting a Data Strategy workshop. Such a workshop aligns all stakeholders around the business objectives. It then allows the group to connect the objectives all the way to the Data they have.

Screen-Shot-2020-03-09-at-9.40.40-AM-300x78 Starting a Data Project

The outcome of the Workshop is an aligned Data & AI Roadmap. We can then jump into execution with the least effort and cost. The initial success then builds confidence in the organization for further projects. It also frees up time of critical resources to contribute to these projects.

Screen-Shot-2020-03-09-at-9.34.57-AM-300x79 Starting a Data Project

Optimus has already conducted Data & AI workshops for various organizations with fantastic results. If you would like your organization to have a clearly defined, cost effective, Data & AI Roadmap, please contact us at rajeev.roy@optimusinfo.com