Data lakes

Book Cover
Average Rating
Published
London : Hoboken : ISTE, Ltd. ; Wiley, 2020.
Status
Available Online

Description

The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata Â? supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.

More Details

Format
Language
English
ISBN
9781119720430, 1119720435, 1119720427, 9781119720423

Notes

General Note
5.1.1. Data lake definition
Bibliography
Includes bibliographical references and index.
Description
The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata - supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.
Local note
O'Reilly O'Reilly Online Learning: Academic/Public Library Edition

Table of Contents

Cover
Half-Title Page
Dedication
Title Page
Copyright Page
Contents
Preface
1. Introduction to Data Lakes: Definitions and Discussions
1.1. Introduction to data lakes
1.2. Literature review and discussion
1.3. The data lake challenges
1.4. Data lakes versus decision-making systems
1.5. Urbanization for data lakes
1.6. Data lake functionalities
1.7. Summary and concluding remarks
2. Architecture of Data Lakes
2.1. Introduction
2.2. State of the art and practice
2.2.1. Definition
2.2.2. Architecture
2.2.3. Metadata
2.2.4. Data quality
2.2.5. Schema-on-read
2.3. System architecture
2.3.1. Ingestion layer
2.3.2. Storage layer
2.3.3. Transformation layer
2.3.4. Interaction layer
2.4. Use case: the Constance system
2.4.1. System overview
2.4.2. Ingestion layer
2.4.3. Maintenance layer
2.4.4. Query layer
2.4.5. Data quality control
2.4.6. Extensibility and flexibility
2.5. Concluding remarks
3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
3.1. Our expectations
3.2. Modeling data lake functionalities
3.3. Building the knowledge base of industrial data lakes
3.4. Our formalization approach
3.5. Applying our approach
3.6. Analysis of our first results
3.7. Concluding remarks
4. Metadata in Data Lake Ecosystems
4.1. Definitions and concepts
4.2. Classification of metadata by NISO
4.2.1. Metadata schema
4.2.2. Knowledge base and catalog
4.3. Other categories of metadata
4.3.1. Business metadata
4.3.2. Navigational integration
4.3.3. Operational metadata
4.4. Sources of metadata
4.5. Metadata classification
4.6. Why metadata are needed
4.6.1. Selection of information (re)sources
4.6.2. Organization of information resources
4.6.3. Interoperability and integration
4.6.4. Unique digital identification
4.6.5. Data archiving and preservation
4.7. Business value of metadata
4.8. Metadata architecture
4.8.1. Architecture scenario 1: point-to-point metadata architecture
4.8.2. Architecture scenario 2: hub and spoke metadata architecture
4.8.3. Architecture scenario 3: tool of record metadata architecture
4.8.4. Architecture scenario 4: hybrid metadata architecture
4.8.5. Architecture scenario 5: federated metadata architecture
4.9. Metadata management
4.10. Metadata and data lakes
4.10.1. Application and workload layer
4.10.2. Data layer
4.10.3. System layer
4.10.4. Metadata types
4.11. Metadata management in data lakes
4.11.1. Metadata directory
4.11.2. Metadata storage
4.11.3. Metadata discovery
4.11.4. Metadata lineage
4.11.5. Metadata querying
4.11.6. Data source selection
4.12. Metadata and master data management
4.13. Conclusion
5. A Use Case of Data Lake Metadata Management
5.1. Context

Discover More

Reviews from GoodReads

Loading GoodReads Reviews.

Citations

APA Citation, 7th Edition (style guide)

Laurent, A., Laurent, D., & Madera, C. (2020). Data lakes . ISTE, Ltd. ; Wiley.

Chicago / Turabian - Author Date Citation, 17th Edition (style guide)

Laurent, Anne, 1976-, Dominique. Laurent and Cédrine. Madera. 2020. Data Lakes. London : Hoboken: ISTE, Ltd. ; Wiley.

Chicago / Turabian - Humanities (Notes and Bibliography) Citation, 17th Edition (style guide)

Laurent, Anne, 1976-, Dominique. Laurent and Cédrine. Madera. Data Lakes London : Hoboken: ISTE, Ltd. ; Wiley, 2020.

Harvard Citation (style guide)

Laurent, A., Laurent, D. and Madera, C. (2020). Data lakes. London : Hoboken: ISTE, Ltd. ; Wiley.

MLA Citation, 9th Edition (style guide)

Laurent, Anne, Dominique Laurent, and Cédrine Madera. Data Lakes ISTE, Ltd. ; Wiley, 2020.

Note! Citations contain only title, author, edition, publisher, and year published. Citations should be used as a guideline and should be double checked for accuracy. Citation formats are based on standards as of August 2021.

Staff View

Grouped Work ID
06bb13ac-9c07-7641-59c9-af105312016a-eng
Go To Grouped Work View in Staff Client

Grouping Information

Grouped Work ID06bb13ac-9c07-7641-59c9-af105312016a-eng
Full titledata lakes
Authoranne laurent dominique laurent cédrine madera
Grouping Categorybook
Last Update2025-01-24 12:33:29PM
Last Indexed2025-05-22 03:01:35AM

Book Cover Information

Image SourcecontentCafe
First LoadedAug 5, 2023
Last UsedJan 22, 2025

Marc Record

First DetectedMar 22, 2023 08:34:25 AM
Last File Modification TimeDec 17, 2024 08:12:01 AM
SuppressedRecord had no items

MARC Record

LEADER06070cam a2200577 a 4500
001on1151184484
003OCoLC
00520241217081018.0
006m     o  d        
007cr un|---aucuu
008200418s2020    enk     ob    001 0 eng d
020 |a 9781119720430|q (electronic bk. ;|q oBook)
020 |a 1119720435|q (electronic bk. ;|q oBook)
020 |a 1119720427
020 |a 9781119720423|q (electronic bk.)
035 |a (OCoLC)1151184484
040 |a EBLCP|b eng|e pn|c EBLCP|d DG1|d OCLCO|d EBLCP|d UKAHL|d OCLCF|d OCLCQ|d S2H|d TOH|d N$T|d K6U|d OCLCO|d OCLCQ|d SFB|d OCLCQ|d OCLCO|d OCLCL|d OCLCQ|d EMRUN|d OCLCQ
049 |a MAIN
050 4|a QA76.9.B45
08204|a 005.7|2 23
24500|a Data lakes /|c edited by Anne Laurent, Dominique Laurent, Cédrine Madera.
260 |a London :|b ISTE, Ltd. ;|a Hoboken :|b Wiley,|c 2020.
300 |a 1 online resource (249 pages)
336 |a text|b txt|2 rdacontent
337 |a computer|b c|2 rdamedia
338 |a online resource|b cr|2 rdacarrier
4901 |a Computer engineering series, databases and big data set ;|v volume 2
500 |a 5.1.1. Data lake definition
504 |a Includes bibliographical references and index.
5050 |a Cover -- Half-Title Page -- Dedication -- Title Page -- Copyright Page -- Contents -- Preface -- 1. Introduction to Data Lakes: Definitions and Discussions -- 1.1. Introduction to data lakes -- 1.2. Literature review and discussion -- 1.3. The data lake challenges -- 1.4. Data lakes versus decision-making systems -- 1.5. Urbanization for data lakes -- 1.6. Data lake functionalities -- 1.7. Summary and concluding remarks -- 2. Architecture of Data Lakes -- 2.1. Introduction -- 2.2. State of the art and practice -- 2.2.1. Definition -- 2.2.2. Architecture -- 2.2.3. Metadata
5058 |a 2.2.4. Data quality -- 2.2.5. Schema-on-read -- 2.3. System architecture -- 2.3.1. Ingestion layer -- 2.3.2. Storage layer -- 2.3.3. Transformation layer -- 2.3.4. Interaction layer -- 2.4. Use case: the Constance system -- 2.4.1. System overview -- 2.4.2. Ingestion layer -- 2.4.3. Maintenance layer -- 2.4.4. Query layer -- 2.4.5. Data quality control -- 2.4.6. Extensibility and flexibility -- 2.5. Concluding remarks -- 3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures -- 3.1. Our expectations -- 3.2. Modeling data lake functionalities
5058 |a 3.3. Building the knowledge base of industrial data lakes -- 3.4. Our formalization approach -- 3.5. Applying our approach -- 3.6. Analysis of our first results -- 3.7. Concluding remarks -- 4. Metadata in Data Lake Ecosystems -- 4.1. Definitions and concepts -- 4.2. Classification of metadata by NISO -- 4.2.1. Metadata schema -- 4.2.2. Knowledge base and catalog -- 4.3. Other categories of metadata -- 4.3.1. Business metadata -- 4.3.2. Navigational integration -- 4.3.3. Operational metadata -- 4.4. Sources of metadata -- 4.5. Metadata classification -- 4.6. Why metadata are needed
5058 |a 4.6.1. Selection of information (re)sources -- 4.6.2. Organization of information resources -- 4.6.3. Interoperability and integration -- 4.6.4. Unique digital identification -- 4.6.5. Data archiving and preservation -- 4.7. Business value of metadata -- 4.8. Metadata architecture -- 4.8.1. Architecture scenario 1: point-to-point metadata architecture -- 4.8.2. Architecture scenario 2: hub and spoke metadata architecture -- 4.8.3. Architecture scenario 3: tool of record metadata architecture -- 4.8.4. Architecture scenario 4: hybrid metadata architecture
5058 |a 4.8.5. Architecture scenario 5: federated metadata architecture -- 4.9. Metadata management -- 4.10. Metadata and data lakes -- 4.10.1. Application and workload layer -- 4.10.2. Data layer -- 4.10.3. System layer -- 4.10.4. Metadata types -- 4.11. Metadata management in data lakes -- 4.11.1. Metadata directory -- 4.11.2. Metadata storage -- 4.11.3. Metadata discovery -- 4.11.4. Metadata lineage -- 4.11.5. Metadata querying -- 4.11.6. Data source selection -- 4.12. Metadata and master data management -- 4.13. Conclusion -- 5. A Use Case of Data Lake Metadata Management -- 5.1. Context
520 |a The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata - supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.
5880 |a Print version record.
590 |a O'Reilly|b O'Reilly Online Learning: Academic/Public Library Edition
650 0|a Big data.|9 403931
650 0|a Databases.
7001 |a Laurent, Anne,|d 1976-|1 https://id.oclc.org/worldcat/entity/E39PCjBHM7Y4PBqD7WDBWGqM6C
7001 |a Laurent, Dominique.
7001 |a Madera, Cédrine.
77608|i Print version:|a Laurent, Anne.|t Data Lakes.|d Newark : John Wiley & Sons, Incorporated, ©2020|z 9781786305855
830 0|a Computer engineering series.|p Databases and big data set ;|v volume 2.
85640|u https://library.access.arlingtonva.us/login?url=https://learning.oreilly.com/library/view/~/9781786305855/?ar|x O'Reilly|z eBook
938 |a Askews and Holts Library Services|b ASKH|n AH37732084
938 |a Askews and Holts Library Services|b ASKH|n AH37348401
938 |a ProQuest Ebook Central|b EBLB|n EBL6173691
938 |a EBSCOhost|b EBSC|n 2436380
994 |a 92|b VIA
999 |c 290969|d 290969