Big Data: Data Lake vs Data Warehouse

Big Data: Data Lake vs Data Warehouse

Dealing with big data can be a minefield! Every second, more and more volumes are being created and, as our clients are only too aware, the storage and safety of this data is of paramount importance. As more companies find themselves accruing mass amounts of data, working out the most business-appropriate way to store their data is something that must be seriously considered.

Below we’ve outlined the two main data repositories: Data Warehouse and Data Lake, both of which have a number of benefits and drawbacks.

Data Warehouse vs Data Lake: A high-level breakdown

Data Warehouse

A Data Warehouse is an organised storage repository.

As part of the initial set up of a data warehouse, the data sources, business processes and inclusion/exclusion protocols must be set. As a general rule, data will only be included in the warehouse if a use has been identified.

The data within a data warehouse is stored, archived and ordered in a pre-defined way.

Benefits:

  • All the data there has a specific purpose, which is defined during the set up.
  • When setting up a Data warehouse, permissions can be set on a pre-agreed role by by role basis. This is great for allowing different levels of access to the information and means that particular business users will be able to report, analyse and extract information from the data as needed.
  • A data warehouse has the ability to provide a flexible multi-layered security set up

Drawbacks:

  • The business processes attached to the development and set up of a data warehouse mean that making any changes to the structure (once it is live) is not an easy task.
  • Data warehouses are usually too restrictive for data scientists who may need to go deeper when analysing and gleaning particular information.

 

Data Lake

A Data Lake is a single-store, unstructured repository.

Unlike a data warehouse, the data within a data lake is loaded unstructured and unorganised. It is not analysed or processed before it enters the repository; it can be loaded in in it’s rawest form. There may be data that is never used within a data lake because data can be accepted from all sources and in all formats.

Configuration (schema creation) takes place as and when the data within a date lake is required.

Benefits:

  • The lack of structure means that changes to models and queries can be made more easily with a data lake.This flexibility makes data lakes appealing to many, they can be configured and reconfigured as necessary.
  • Deep analysis is possible, which is useful for data scientists.
  • Data lakes can support all users and are available to all.
  • A data lake can hold all data until it is needed.
  • There is just the one store to manage so auditing and compliance become easier.

Drawbacks:

  • With all the data stored in one repository, their is a concern that the data is potentially more vulnerable.

Avoiding the Data Swamp

If a Data Lake isn’t properly maintained, there is a danger of it becoming a data swamp. This happens when the data within the lake becomes deteriorated or useless and inaccessible to the users. Having a plan, vision and goal for your data lake is key.

Can we help you make sense out of your ever-growing data storage? Get in touch and find out more about our tailor-made storage solutions: hello@support-partners.com

By |2018-10-25T17:09:46+00:00August 30th, 2018|

About the Author:

Nick is an experienced broadcast & post production technology consultant/engineer with a track record of analyzing, defining and implementing new technology & workflow strategy across multiple sectors (broadcast post & news, post production, government and MoD), whilst maintaining commercial focus and alignment to business drivers. Nick has extensive knowledge in shared storage, tapeless & file-based workflow with over 15 years of experience designing, implementing and supporting solutions from industry leading vendors.