Data Platform Engineer Location: Princeton NJ (prefer onsite) Duration: 6 months No third party C2C
- Design and implement Azure cloud-based Data Warehousing and Governance architecture with Lakehouse paradigm
- Integrating technical functionality, ensuring data accessibility, accuracy, and security.
- Architect the Unity Catalog to provide centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.
- Define and organize data assets (structured and unstructured) within the Unity Catalog.
- Enable data analysts and etl engineers to Client and classify data, notebooks, dashboards, and files across clouds and platforms.
- Implement a single permission model for data and AI assets.
- Define access policies at a granular level (rows, columns, features) to ensure secure and consistent access management across workspaces and platforms.
- Leverage Delta Sharing to enable easy data sharing across regions, and platforms.
- Ensure that data and AI assets can be securely shared with minimal replication, maintaining a unified experience for users.
- Monitoring and Observability: utilize AI to automate monitoring, diagnose errors, and maintain data and quality.
- Set up alerts for personally identifiable information (PII) detection, and operational intelligence.
- Work closely with data scientists, analysts, and engineers to promote adoption of the Unity Catalog.
- Provide training and documentation to ensure effective usage and compliance with governance policies
Skills:
- Designed data warehouse and data lake solutions along with data processing Pipeline using PySpark using Databricks
- Performed Data Modelling on Databricks [Delta Table] for transactional and analytical need.
- Designed and developed pipelines to load data to Data Lake
- Databricks Platform Proficiency, including its components like Databricks SQL, Delta Live Tables, Databricks Repos, and Task Orchestration.
- Deep understanding of data governance principles, especially related to data cataloging, access control, lineage, and metadata management.
- Strong SQL skills for querying and managing data
- Ability to design and optimize data models for structured and unstructured data.
- Understand how to manage compute resources, including clusters and workspaces.
- Ability to adapt to changes and emerging trends in data engineering and governance.
- Involved in hands on development and configuration of Unity Catalog