Data lakes
Series
Published
London : Hoboken : ISTE, Ltd. ; Wiley, 2020.
Status
Available Online
Description
The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata Â? supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.
More Details
Format
Language
English
ISBN
9781119720430, 1119720435, 1119720427, 9781119720423
Notes
General Note
5.1.1. Data lake definition
Bibliography
Includes bibliographical references and index.
Description
The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata - supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management.
Local note
O'Reilly O'Reilly Online Learning: Academic/Public Library Edition
Table of Contents
Cover
Half-Title Page
Dedication
Title Page
Copyright Page
Contents
Preface
1. Introduction to Data Lakes: Definitions and Discussions
1.1. Introduction to data lakes
1.2. Literature review and discussion
1.3. The data lake challenges
1.4. Data lakes versus decision-making systems
1.5. Urbanization for data lakes
1.6. Data lake functionalities
1.7. Summary and concluding remarks
2. Architecture of Data Lakes
2.1. Introduction
2.2. State of the art and practice
2.2.1. Definition
2.2.2. Architecture
2.2.3. Metadata
2.2.4. Data quality
2.2.5. Schema-on-read
2.3. System architecture
2.3.1. Ingestion layer
2.3.2. Storage layer
2.3.3. Transformation layer
2.3.4. Interaction layer
2.4. Use case: the Constance system
2.4.1. System overview
2.4.2. Ingestion layer
2.4.3. Maintenance layer
2.4.4. Query layer
2.4.5. Data quality control
2.4.6. Extensibility and flexibility
2.5. Concluding remarks
3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
3.1. Our expectations
3.2. Modeling data lake functionalities
3.3. Building the knowledge base of industrial data lakes
3.4. Our formalization approach
3.5. Applying our approach
3.6. Analysis of our first results
3.7. Concluding remarks
4. Metadata in Data Lake Ecosystems
4.1. Definitions and concepts
4.2. Classification of metadata by NISO
4.2.1. Metadata schema
4.2.2. Knowledge base and catalog
4.3. Other categories of metadata
4.3.1. Business metadata
4.3.2. Navigational integration
4.3.3. Operational metadata
4.4. Sources of metadata
4.5. Metadata classification
4.6. Why metadata are needed
4.6.1. Selection of information (re)sources
4.6.2. Organization of information resources
4.6.3. Interoperability and integration
4.6.4. Unique digital identification
4.6.5. Data archiving and preservation
4.7. Business value of metadata
4.8. Metadata architecture
4.8.1. Architecture scenario 1: point-to-point metadata architecture
4.8.2. Architecture scenario 2: hub and spoke metadata architecture
4.8.3. Architecture scenario 3: tool of record metadata architecture
4.8.4. Architecture scenario 4: hybrid metadata architecture
4.8.5. Architecture scenario 5: federated metadata architecture
4.9. Metadata management
4.10. Metadata and data lakes
4.10.1. Application and workload layer
4.10.2. Data layer
4.10.3. System layer
4.10.4. Metadata types
4.11. Metadata management in data lakes
4.11.1. Metadata directory
4.11.2. Metadata storage
4.11.3. Metadata discovery
4.11.4. Metadata lineage
4.11.5. Metadata querying
4.11.6. Data source selection
4.12. Metadata and master data management
4.13. Conclusion
5. A Use Case of Data Lake Metadata Management
5.1. Context
Subjects
LC Subjects
Reviews from GoodReads
Loading GoodReads Reviews.
Citations
APA Citation, 7th Edition (style guide)
Laurent, A., Laurent, D., & Madera, C. (2020). Data lakes . ISTE, Ltd. ; Wiley.
Chicago / Turabian - Author Date Citation, 17th Edition (style guide)Laurent, Anne, 1976-, Dominique. Laurent and Cédrine. Madera. 2020. Data Lakes. London : Hoboken: ISTE, Ltd. ; Wiley.
Chicago / Turabian - Humanities (Notes and Bibliography) Citation, 17th Edition (style guide)Laurent, Anne, 1976-, Dominique. Laurent and Cédrine. Madera. Data Lakes London : Hoboken: ISTE, Ltd. ; Wiley, 2020.
Harvard Citation (style guide)Laurent, A., Laurent, D. and Madera, C. (2020). Data lakes. London : Hoboken: ISTE, Ltd. ; Wiley.
MLA Citation, 9th Edition (style guide)Laurent, Anne, Dominique Laurent, and Cédrine Madera. Data Lakes ISTE, Ltd. ; Wiley, 2020.
Note! Citations contain only title, author, edition, publisher, and year published. Citations should be used as a guideline and should be double checked for accuracy. Citation formats are based on standards as of August 2021.
Staff View
Grouped Work ID
06bb13ac-9c07-7641-59c9-af105312016a-eng
Grouping Information
Grouped Work ID | 06bb13ac-9c07-7641-59c9-af105312016a-eng |
---|---|
Full title | data lakes |
Author | anne laurent dominique laurent cédrine madera |
Grouping Category | book |
Last Update | 2025-01-24 12:33:29PM |
Last Indexed | 2025-05-22 03:01:35AM |
Book Cover Information
Image Source | contentCafe |
---|---|
First Loaded | Aug 5, 2023 |
Last Used | Jan 22, 2025 |
Marc Record
First Detected | Mar 22, 2023 08:34:25 AM |
---|---|
Last File Modification Time | Dec 17, 2024 08:12:01 AM |
Suppressed | Record had no items |
MARC Record
LEADER | 06070cam a2200577 a 4500 | ||
---|---|---|---|
001 | on1151184484 | ||
003 | OCoLC | ||
005 | 20241217081018.0 | ||
006 | m o d | ||
007 | cr un|---aucuu | ||
008 | 200418s2020 enk ob 001 0 eng d | ||
020 | |a 9781119720430|q (electronic bk. ;|q oBook) | ||
020 | |a 1119720435|q (electronic bk. ;|q oBook) | ||
020 | |a 1119720427 | ||
020 | |a 9781119720423|q (electronic bk.) | ||
035 | |a (OCoLC)1151184484 | ||
040 | |a EBLCP|b eng|e pn|c EBLCP|d DG1|d OCLCO|d EBLCP|d UKAHL|d OCLCF|d OCLCQ|d S2H|d TOH|d N$T|d K6U|d OCLCO|d OCLCQ|d SFB|d OCLCQ|d OCLCO|d OCLCL|d OCLCQ|d EMRUN|d OCLCQ | ||
049 | |a MAIN | ||
050 | 4 | |a QA76.9.B45 | |
082 | 0 | 4 | |a 005.7|2 23 |
245 | 0 | 0 | |a Data lakes /|c edited by Anne Laurent, Dominique Laurent, Cédrine Madera. |
260 | |a London :|b ISTE, Ltd. ;|a Hoboken :|b Wiley,|c 2020. | ||
300 | |a 1 online resource (249 pages) | ||
336 | |a text|b txt|2 rdacontent | ||
337 | |a computer|b c|2 rdamedia | ||
338 | |a online resource|b cr|2 rdacarrier | ||
490 | 1 | |a Computer engineering series, databases and big data set ;|v volume 2 | |
500 | |a 5.1.1. Data lake definition | ||
504 | |a Includes bibliographical references and index. | ||
505 | 0 | |a Cover -- Half-Title Page -- Dedication -- Title Page -- Copyright Page -- Contents -- Preface -- 1. Introduction to Data Lakes: Definitions and Discussions -- 1.1. Introduction to data lakes -- 1.2. Literature review and discussion -- 1.3. The data lake challenges -- 1.4. Data lakes versus decision-making systems -- 1.5. Urbanization for data lakes -- 1.6. Data lake functionalities -- 1.7. Summary and concluding remarks -- 2. Architecture of Data Lakes -- 2.1. Introduction -- 2.2. State of the art and practice -- 2.2.1. Definition -- 2.2.2. Architecture -- 2.2.3. Metadata | |
505 | 8 | |a 2.2.4. Data quality -- 2.2.5. Schema-on-read -- 2.3. System architecture -- 2.3.1. Ingestion layer -- 2.3.2. Storage layer -- 2.3.3. Transformation layer -- 2.3.4. Interaction layer -- 2.4. Use case: the Constance system -- 2.4.1. System overview -- 2.4.2. Ingestion layer -- 2.4.3. Maintenance layer -- 2.4.4. Query layer -- 2.4.5. Data quality control -- 2.4.6. Extensibility and flexibility -- 2.5. Concluding remarks -- 3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures -- 3.1. Our expectations -- 3.2. Modeling data lake functionalities | |
505 | 8 | |a 3.3. Building the knowledge base of industrial data lakes -- 3.4. Our formalization approach -- 3.5. Applying our approach -- 3.6. Analysis of our first results -- 3.7. Concluding remarks -- 4. Metadata in Data Lake Ecosystems -- 4.1. Definitions and concepts -- 4.2. Classification of metadata by NISO -- 4.2.1. Metadata schema -- 4.2.2. Knowledge base and catalog -- 4.3. Other categories of metadata -- 4.3.1. Business metadata -- 4.3.2. Navigational integration -- 4.3.3. Operational metadata -- 4.4. Sources of metadata -- 4.5. Metadata classification -- 4.6. Why metadata are needed | |
505 | 8 | |a 4.6.1. Selection of information (re)sources -- 4.6.2. Organization of information resources -- 4.6.3. Interoperability and integration -- 4.6.4. Unique digital identification -- 4.6.5. Data archiving and preservation -- 4.7. Business value of metadata -- 4.8. Metadata architecture -- 4.8.1. Architecture scenario 1: point-to-point metadata architecture -- 4.8.2. Architecture scenario 2: hub and spoke metadata architecture -- 4.8.3. Architecture scenario 3: tool of record metadata architecture -- 4.8.4. Architecture scenario 4: hybrid metadata architecture | |
505 | 8 | |a 4.8.5. Architecture scenario 5: federated metadata architecture -- 4.9. Metadata management -- 4.10. Metadata and data lakes -- 4.10.1. Application and workload layer -- 4.10.2. Data layer -- 4.10.3. System layer -- 4.10.4. Metadata types -- 4.11. Metadata management in data lakes -- 4.11.1. Metadata directory -- 4.11.2. Metadata storage -- 4.11.3. Metadata discovery -- 4.11.4. Metadata lineage -- 4.11.5. Metadata querying -- 4.11.6. Data source selection -- 4.12. Metadata and master data management -- 4.13. Conclusion -- 5. A Use Case of Data Lake Metadata Management -- 5.1. Context | |
520 | |a The concept of a data lake is less than 10 years old, but they are already hugely implemented within large companies. Their goal is to efficiently deal with ever-growing volumes of heterogeneous data, while also facing various sophisticated user needs. However, defining and building a data lake is still a challenge, as no consensus has been reached so far. Data Lakes presents recent outcomes and trends in the field of data repositories. The main topics discussed are the data-driven architecture of a data lake; the management of metadata - supplying key information about the stored data, master data and reference data; the roles of linked data and fog computing in a data lake ecosystem; and how gravity principles apply in the context of data lakes. A variety of case studies are also presented, thus providing the reader with practical examples of data lake management. | ||
588 | 0 | |a Print version record. | |
590 | |a O'Reilly|b O'Reilly Online Learning: Academic/Public Library Edition | ||
650 | 0 | |a Big data.|9 403931 | |
650 | 0 | |a Databases. | |
700 | 1 | |a Laurent, Anne,|d 1976-|1 https://id.oclc.org/worldcat/entity/E39PCjBHM7Y4PBqD7WDBWGqM6C | |
700 | 1 | |a Laurent, Dominique. | |
700 | 1 | |a Madera, Cédrine. | |
776 | 0 | 8 | |i Print version:|a Laurent, Anne.|t Data Lakes.|d Newark : John Wiley & Sons, Incorporated, ©2020|z 9781786305855 |
830 | 0 | |a Computer engineering series.|p Databases and big data set ;|v volume 2. | |
856 | 4 | 0 | |u https://library.access.arlingtonva.us/login?url=https://learning.oreilly.com/library/view/~/9781786305855/?ar|x O'Reilly|z eBook |
938 | |a Askews and Holts Library Services|b ASKH|n AH37732084 | ||
938 | |a Askews and Holts Library Services|b ASKH|n AH37348401 | ||
938 | |a ProQuest Ebook Central|b EBLB|n EBL6173691 | ||
938 | |a EBSCOhost|b EBSC|n 2436380 | ||
994 | |a 92|b VIA | ||
999 | |c 290969|d 290969 |