The product, explained
Diagnose. Remove. Prove.
Three things Rosa does to a dataset, and one thing it never does: change your numbers' statistics.
Measure how recoverable the protected attribute is
You supply a CSV and a short JSON configuration naming one protected
attribute (the bias_columns entry), any columns to ignore,
and which columns are categorical. That is the whole setup.
Rosa runs the Fair Adversarial Network (FAN), the proven adversarial method at Rosa's core: a discriminator repeatedly tries to recover the protected attribute from the rest of the data. How well it succeeds is a direct measure of how much bias the data encodes, whether the signal is carried directly or through proxies (a feature that stands in for the protected attribute, like a postcode for race, or an employment gap for gender).
No proxy labelling required. You name only the protected attribute. Rosa finds the proxies itself, and scores each column's contribution to the encoded bias.
Output: a measured bias score and a PDF Dataset Intake Report, plus the Run Manifest.
Transform the data, preserve its statistics
Rosa transforms the dataset so a downstream model cannot distinguish individuals by the protected attribute. It uses rank-mapping: values move to fair rank positions, but each column keeps its own distribution. In training output the distribution is preserved bit-for-bit, to float64 precision; in inference output, to roughly 1e-6.
That preservation is the key technical promise. Your data stays usable, your aggregate statistics stay true, and your model stays calibrated, because Rosa does not generate synthetic data: it reassigns your real values fairly. The one exception is standard pre-processing: empty cells are imputed before training (the column mean for numeric columns, the most frequent value for categorical columns), the same step most data pipelines already apply.
Output: a debiased CSV, a trained FAN model for use on future data, and a PDF report.
model race-disparity 0.54 column distribution
with Rosa 0.09 same distribution, preserved
See the methodology
Evidence as a byproduct of the work
Every run emits an immutable Run Manifest: job id, mode, input hash, schema hash, config hash, container digest, timestamps, row counts, and the bias summary, alongside the PDF report.
The input hash is the SHA-256 of the raw input file's bytes. Anyone holding the original file can recompute it and verify exactly what was processed. The manifest is written once per job, including failed ones, and retained indefinitely.
Audit evidence is generated by doing the work, not assembled as a separate documentation exercise afterwards.
- job_id
- 550e8400-e29b-41d4-a716-446655440000
- mode
- remove_bias_training
- job_status
- complete
- timestamp_submitted
- 2026-06-10T09:14:02Z
- timestamp_completed
- 2026-06-10T09:31:47Z
- row_count
- 2,000
- bias_columns
- ["race"]
- input_hash
- sha256:9f1c…e7a2
- schema_hash
- sha256:4b08…21cd
- config_hash
- sha256:d3aa…90f4
- container_digest
- sha256:71be…0c55
- bias (pre)
- 0.21
- residual_bias
- 0.001
- artifacts
- remove_bias_report.pdf, compas_preconditioned_fair.csv
Rosa becomes a stage in your pipeline
A model trained on Rosa-debiased data should receive Rosa-debiased data at inference. So Rosa is not a one-time clean-up: it becomes a permanent, auditable stage in your data pipeline. Train once, then run inference on your operational data as it arrives.
That is a feature, not a catch. It means fairness is continuously applied and continuously evidenced: every batch that passes through the pipeline leaves a manifest behind it.
Three ways in: portal, REST, MCP
Customer Portal
One-click in the browser, including Test 1, the Apple Card demo. Upload a CSV, run Diagnose or Remove, download the outputs and the manifest.
portal.rosadebias.comREST API
An asynchronous job API at api.rosadebias.com/v1: submit a
job, poll its status, fetch artifacts, report, and manifest.
MCP server
Eight tools over the Model Context Protocol (MCP), the open standard for connecting AI agents to tools: diagnose, remove bias, job status, report, artifacts, manifest, list jobs, cancel.
MCP guide in the portalHonest scope
- Rosa removes bias it can statistically detect, on one protected attribute per run (univariate, in this phase).
- It preserves your data's distributions; it does not synthesise records. The only values Rosa fills in are missing cells, imputed the standard way before training (column mean for numeric, most frequent value for categorical).
- It will decline to "debias" a signal it cannot measure, and it says so rather than producing a hollow result.
- Residual bias is dataset-specific. Your manifest reports the figure for your data; we do not quote a universal number.
See it run before you believe it.
Test 1 runs in the browser in one click. No credit card.