Data & MoreEngineeringDeep dive index

The whole platform, on one map.

Data & More is a multi-tenant GDPR and data-governance platform. It connects to an organisation's email, file storage and chat systems, reads everything it finds, classifies every document for personal and sensitive data, and helps the data owner act on it. This page is the map: what the platform does, how the parts fit, and where to read the detail.

You cannot govern what you have not first read, classified, and understood.

~42microservices

2core languages: Python, Java

7pipeline stages, source to report

1shared store: Elasticsearch

Section 01How it works

Classify, Verify, Delete, always on

The platform runs continuously, not on a fixed quarterly cadence. The process has a name, Classify, Verify, Delete: the platform classifies what sits in the archive, the data owner verifies the findings on their own ground, and the result is deletion (or archive, edit or restrict) carried out in the source system. Sources and exceptions are the one-time configuration that feeds the loop.

Classify, the platform's job Verify, the data owner's job Delete, the continuous goal

Where the data comes from

Connected sources

Each source is wired through one of the platform's ingestion connectors. The cycle starts here and writes findings back here too.

Office 365ExchangeSharePointOneDrive TeamsOutlookGmailGoogle DriveFile shares

What Delete means in practice

Three ways to delete

Deletion is never automatic. The data owner reviews each finding locally and picks one of three actions, recorded in the audit log.

Restrict, the record stays put but is flagged for restricted access.
Archive (retention), the record is moved to an archive with a defined retention rule.
Delete, the record is permanently removed from the source system.

Section 02Conceptual model

How the parts fit together

Every request enters through one TLS-terminating NGINX proxy and is routed to the application tier: the Vue client, the main Flask API, and the IAM, analytics and LLM services. The API hands slow work to an asynchronous backbone of Celery workers over RabbitMQ, which also carries the scan, ingest and enforce events for the Java tier. The heavy lifting (crawling sources, extracting text, enforcing policy) runs in that Java tier. Underneath it all sits the data layer, with Elasticsearch as the document store every service shares.

Request and data path Persisted store / output Sub-component inside a tier

The full service estate, gateway, tenancy and delivery model are covered in Platform Architecture below.

Section 03End to end

From a raw source to an enforced policy

A document makes the same journey every time. A source is configured once; from then on the platform crawls it, extracts its text, profiles it for personal data, checks it against the tenant's policies, acts on the verdict, and reports the result. Cheap, deterministic stages run first; the expensive AI and enforcement steps run only on what reaches them.

Main pipeline path Shared store, touched at every stage Output to the user

What the platform does, in six verbs

01 · Connect

Connect

Wire in email, file storage and chat through one connector per source.

02 · Scan

Scan & read

Crawl each source and extract text, including OCR for images and scans.

03 · Classify

Classify

Profile every document for personal and sensitive data against the taxonomy.

04 · Decide

Decide

Check findings against the tenant's retention and sensitivity policies.

05 · Act

Act

Edit, archive, restrict or delete, always confirmed by the data owner.

06 · Report

Report

Dashboards, alerts and exportable reports for evidence and audit.

Section 04Read the detail

Inside this deep dive

The map above stays deliberately shallow. Four focused field manuals carry the detail, each one owning a distinct layer of the platform with no overlap between them.

01 Platform ArchitectureThe ~42-service estate: app tier, storage, gateway, tenancy and delivery. Read → 02 Data PipelineThe seven-stage journey from connected source to enforced policy. Read → 03 OCR PipelineFive passes that turn a raw page into clean text and signals. Read → 04 AI ProfilerNER, phrase and dependency lenses, and the full classification taxonomy. Read →

Where this fitsStart here

One map, four field manuals.

This overview is the entry point to the Deep Dive section. Read it for the shape of the whole platform, then follow a link above to the layer you need. Each deep dive is self-contained and assumes only what is on this page.

deep-dive · support.dataandmore.com/en/knowledge/deep-dive