How data moves through the Data & More GDPR compliance platform: from source ingestion through automated classification to policy enforcement and audit.
1
External Data Sources
Origin of customer data to be governed
Microsoft 365
Outlook / SharePoint / OneDrive / Teams
Primary enterprise data source covering email, documents, file storage, and team communication channels.
ProtoEWS / Graph API
DataEmails, attachments, documents, chat
AuthOAuth 2.0 / Azure AD
Google Workspace
Gmail / Drive
ProtoGoogle APIs
AuthOAuth 2.0 / Service Account
Network Fileshare
SMB / NFS
DataFiles, directories, permissions
Azure AD
Users / Groups
ProtoMicrosoft Graph API
DataIdentities, group memberships
Websites
HTTP crawl
DataWeb pages, published content
API calls and crawlingOAuth tokensIncremental sync
2
Connector Services
Source-specific ingestion adapters
ews
Java / Spring Boot
Exchange Web Services connector for Outlook, OneDrive, SharePoint, and Teams. Supports crawl, move, delete, validate, and revert actions.
LangJava (Spring Boot)
OutputElasticsearch
graph-ingestion
Scala / Maven
MS Graph API with three sub-services: graph-management, dm-graph-ingestion, graph-enforcer.
ConnectsES + RabbitMQ
google
Python
Google Workspace ingestion: Gmail and Google Drive scanning.
OutputElasticsearch
fileshare-service
Java
Network file scanning via SMB/NFS.
OutputElasticsearch
collector
Java / Gradle
Universal batch collector for generic source types.
OutputAPI then Elasticsearch
website-source
Python
HTTP/HTTPS web crawling service.
OutputES + RabbitMQ
HTTP batch ingestionDirect ES indexing
3
Core API Layer
Central orchestrator and auth
api
Python / Flask :8000
REST API with 20+ blueprint modules: documents, sources, policies, tags, reports, users, configuration, and more.
Port8000
ConnectsES, PostgreSQL, RabbitMQ, IAM
task-worker
Python / Celery
Async job processor via RabbitMQ broker.
BrokerRabbitMQ (AMQP :5672)
MonitorFlower (:5555)
iam
Python / Flask :5000
Identity and Access Management with JWT, company-scoped isolation.
Port5000
ConnectsPostgreSQL
dlp
Python / Flask + Celery :5050
Data Loss Prevention service with dedicated PostgreSQL database (dlp_dam) and RabbitMQ connection.
Port5050
ConnectsES, PostgreSQL, RabbitMQ
chat-api
Python
LLM-powered document Q&A for chat-based GDPR compliance.
ConnectsES, IAM, LLM-management
Celery tasks via AMQP :5672Periodic scheduled jobs
4
Processing Workers
Celery tasks consumed from RabbitMQ
Classification
Celery task
PII detection, sensitivity scoring, language identification.
Tasksclassification_base, global_classification
OutputDS_PiiScore, DS_SensitivityScore
ai-profiler
Python / spaCy / Celery
NLP-based entity extraction and document profiling.
OutputES profiler results
Policy Enforcement
Celery task
Retention rules: delete, archive, quarantine with full audit.
Tasksenforce, update_policies, purge
Auditdeleted_data ES index
Tag Assignment
Celery task
Delta tagging and document class assignment.
OutputDS_Tags, DS_DocumentClass
ocr
Python
Text extraction from scanned documents and images.
OutputEnriched document in ES
DataSubject Manager
Java
VIP and entity matching against known persons database.
OutputES data subject metadata
ES / PG Sync
Celery task
Bidirectional Elasticsearch to PostgreSQL synchronization.
Taskses2postgre, syncpostgres
Backup Worker
Celery task
Daily S3 exports as JSON batches (750 docs/file), 180-day retention.
TargetS3 (180-day retention)
REST :9200SQL :5432AMQP :5672S3 API
5
Data Stores
Persistence and messaging infrastructure
Elasticsearch 9.x
Primary index :9200
Document search, classification data, audit logs, and system events. The central nervous system of the platform.