Home Accueil / Portfolio / gws_great_expectations
Data Engineering · Data Quality

Constellab · gws_great_expectations

A GWS brick that enables Great Expectations inside GWS workflows: validate tabular datasets, generate Data Docs, and publish them directly in the platform with a runtime-compatible setup.

Client

Constellab

Role

Data engineering & integration

Focus

Validation workflow in GWS

Stack

Python gws_core great-expectations==0.18.21 Pandas

Context

Contexte

In many GWS environments, data quality checks were either externalized or implemented with ad hoc scripts, which made validation harder to standardize and less visible for non-technical teams. This project was built to make quality control a native part of platform workflows.

Dans de nombreux environnements GWS, les controles qualite etaient externalises ou geres par des scripts ad hoc, ce qui rendait la validation difficile a standardiser et peu visible pour les equipes non techniques. Ce projet a ete concu pour rendre le controle qualite natif dans les workflows de la plateforme.

Project objective

Objectif du projet

The goal was to bring Great Expectations into GWS in a production-friendly way: reusable validation tasks, clear deliverables for data/quality teams, and execution stability in the actual runtime used by projects.

L'objectif etait d'integrer Great Expectations dans GWS de maniere exploitable en production : taches de validation reutilisables, livrables lisibles pour les equipes data/qualite, et stabilite d'execution dans le runtime reel des projets.

Main contributions

Contributions principales

  • Enabled Great Expectations directly inside GWS pipelines for tabular resources and CSV folders.
  • Integrated automatic publication of Data Docs as a native GWS resource (GxDataDocsResource).
  • Defined configurable validation behaviors to support both generic and business-specific quality rules.
  • Delivered specialized task variants for domain contexts, including CDISC and NGS workflows.
  • Secured runtime compatibility by aligning dependencies around great-expectations==0.18.21.
  • Rendu Great Expectations directement utilisable dans les pipelines GWS pour les ressources tabulaires et les dossiers CSV.
  • Integre la publication automatique des Data Docs comme ressource GWS native (GxDataDocsResource).
  • Defini des comportements de validation configurables pour couvrir des regles qualite standards et metier.
  • Livre des variantes specialisees de taches pour des contextes domaine, notamment CDISC et NGS.
  • Securise la compatibilite runtime via l'alignement des dependances autour de great-expectations==0.18.21.

Project architecture

Architecture du projet

  • src/gws_great_expectations/gx_data_docs_demo.py: core implementation and GWS task logic.
  • src/gws_great_expectations/__init__.py: public exports for task registration.
  • example_input_gx_customer_quality.csv: representative input dataset used for demonstration.
  • src/gws_great_expectations/gx_data_docs_demo.py : implementation principale et logique des taches GWS.
  • src/gws_great_expectations/__init__.py : exports publics pour l'enregistrement des taches.
  • example_input_gx_customer_quality.csv : jeu de donnees representatif utilise pour la demonstration.

Project impact

Great Expectations becomes a first-class capability in GWS: validation, reporting, and visibility are unified in one platform experience for engineering and quality teams.

Great Expectations devient une capacite native de GWS : validation, reporting et visibilite sont reunis dans une meme experience plateforme pour les equipes engineering et qualite.