Data Swamp
- Categories: #file-organization #dropbox
Overview
Adoption of data lakes, or ad-hoc shared heterogeneous data repositories, have grown in recent years due to a low barrier to store and analyze stored data. However, due to a lack of metadata and provenance information these systems can often become messy and disorganized resulting in challenges for data management problems, including data discovery, file co-management, and lineage tracking. As part of this project we are building proactive and retrospective tools to aid in the organization of data lakes.