No description
  • HTML 67%
  • Python 33%
Find a file
2026-06-23 14:24:46 +02:00
data/results remove datasets for file size reasons 2026-06-23 13:52:43 +02:00
images add local DP things 2026-06-23 13:50:12 +02:00
notebooks add local DP things 2026-06-23 13:50:12 +02:00
references refactor 2026-05-26 10:00:36 +02:00
reports add dp ml models 2026-06-15 11:22:54 +02:00
src add local DP things 2026-06-23 13:50:12 +02:00
.gitignore remove datasets for file size reasons 2026-06-23 13:52:43 +02:00
.python-version refactor 2026-05-26 10:00:36 +02:00
main.py refactor 2026-05-26 10:00:36 +02:00
pyproject.toml add local dp experiment 2026-06-23 10:15:53 +02:00
README.md Update README.md 2026-06-23 14:24:46 +02:00
uv.lock add local dp experiment 2026-06-23 10:15:53 +02:00

Utility vs. Fidelity

This repository contains some experiments surrounding the relationship between utility and fidelity in differential privacy.

Reproduction

To run this project you must have uv installed on your system, download it via your systems package manager or through this link.

To reproduce the results simply clone the project and run the following commands in the project directory:

uv sync
uv run marimo edit

Your default browser should now open a notebook, here you can run the experiments as you please. If you prefer to run the dp_ml experiments heedlessly, you can do so via:

uv run notebooks/run_experiment.py

Once the experiments finish and a CSV file is generated, simply press the toggle in the notebook to skip running the experiments and go straight to plotting.

Keep in mind that to run the experiments you will need the corresponding datasets. There is a dropdown at the top of each notebook, the acs_income will donwload automatically if chosen, the rest can be found here and placed into the data/datasets directory:

  1. Adult Income Dataset: Predict whether an individual makes over $50,000 a year.
  2. UCI Heart Disease Dataset: This is a public healthcare classification dataset and could serve as a reproducible alternative to the closed COVID dataset.
  3. German Credit / South German Credit Dataset: This is a public credit-risk classification dataset with mixed categorical and numerical attributes.