No description

HTML 67%
Python 33%

Find a file

Leon A. 677d83952e Update README.md		2026-06-23 14:24:46 +02:00
data/results	remove datasets for file size reasons	2026-06-23 13:52:43 +02:00
images	add local DP things	2026-06-23 13:50:12 +02:00
notebooks	add local DP things	2026-06-23 13:50:12 +02:00
references	refactor	2026-05-26 10:00:36 +02:00
reports	add dp ml models	2026-06-15 11:22:54 +02:00
src	add local DP things	2026-06-23 13:50:12 +02:00
.gitignore	remove datasets for file size reasons	2026-06-23 13:52:43 +02:00
.python-version	refactor	2026-05-26 10:00:36 +02:00
main.py	refactor	2026-05-26 10:00:36 +02:00
pyproject.toml	add local dp experiment	2026-06-23 10:15:53 +02:00
README.md	Update README.md	2026-06-23 14:24:46 +02:00
uv.lock	add local dp experiment	2026-06-23 10:15:53 +02:00

README.md

Utility vs. Fidelity

This repository contains some experiments surrounding the relationship between utility and fidelity in differential privacy.

Reproduction

To run this project you must have uv installed on your system, download it via your systems package manager or through this link.

To reproduce the results simply clone the project and run the following commands in the project directory:

uv sync
uv run marimo edit

Your default browser should now open a notebook, here you can run the experiments as you please. If you prefer to run the dp_ml experiments heedlessly, you can do so via:

uv run notebooks/run_experiment.py

Once the experiments finish and a CSV file is generated, simply press the toggle in the notebook to skip running the experiments and go straight to plotting.

Keep in mind that to run the experiments you will need the corresponding datasets. There is a dropdown at the top of each notebook, the acs_income will donwload automatically if chosen, the rest can be found here and placed into the data/datasets directory:

Adult Income Dataset: Predict whether an individual makes over $50,000 a year.
- Dataset: https://www.kaggle.com/datasets/wenruliu/adult-income-dataset
UCI Heart Disease Dataset: This is a public healthcare classification dataset and could serve as a reproducible alternative to the closed COVID dataset.
- Dataset: UCI Heart Disease Dataset https://archive.ics.uci.edu/dataset/45/heart%2Bdisease
German Credit / South German Credit Dataset: This is a public credit-risk classification dataset with mixed categorical and numerical attributes.
- Dataset: UCI Statlog German Credit Data https://archive.ics.uci.edu/dataset/522/south%2Bgerman%2Bcredit