Optimization of target values in NNs
  • Jupyter Notebook 76.8%
  • TeX 16.5%
  • Python 6.7%
Find a file
2025-12-11 01:41:35 +01:00
docs add some docs 2025-05-03 14:28:24 +02:00
graphs add results and some discussion, add more graphs 2025-07-03 17:35:25 +02:00
presentations intoduction feedback fixed 2025-06-15 00:19:17 +02:00
report small fixes in report 2025-10-03 21:14:59 +02:00
summaries add other documents so that everything is in git 2025-05-06 13:55:53 +02:00
.gitignore Initial commit 2025-04-11 11:15:58 +02:00
adaptations.py add simple visualizations and improve logging 2025-07-02 14:34:39 +02:00
data.py big refactor 2025-06-07 14:28:35 +02:00
experiments.py add simple visualizations and improve logging 2025-07-02 14:34:39 +02:00
logger.py big refactor 2025-06-07 14:28:35 +02:00
main.py add nudging of initial target values 2025-06-29 20:42:37 +02:00
models.py add nudging of initial target values 2025-06-29 20:42:37 +02:00
README.md Update README.md 2025-12-11 01:41:35 +01:00
Requirements.txt add simple visualizations and improve logging 2025-07-02 14:34:39 +02:00
utils.py big refactor 2025-06-07 14:28:35 +02:00
warmup.py fix nudge pushing out of bounds 2025-06-30 13:23:49 +02:00

Target Value Optimization in Neural Networks

When training a classification neural network, each output neuron represents a class. For example, in a 3-class problem (cat, dog, rat), a typical target vector for a cat image might be:

image

These values (class = 1, non-class = 0) are compared to the network's output to compute the loss and update the weights.

However, using fixed values like 1 and 0 may not always yield optimal learning. Adjusting the class and non-class target values (e.g., 0.8 and 0.2) can improve training performance (especially with activation functions like sigmoid) by enhancing gradient flow.

This project explores dynamically optimizing these values during training instead of using fixed constants. Early results show improvements in training speed and confidence, though gains in test accuracy are yet to be achieved.

📄 See /docs for more details.

TO-DO

Implementation

  • Implement basic σ-adaptation logic (only nc)
  • Switch to using one non-class value for each class instead of one global one
  • Fix confidence calculation: Calculate cosine similarity of output vector to closest target vector (not necessarily the correct target)
  • Log current class/non-class values
  • Set up structured experiments and logging
  • Build simple frontend for selecting method, starting and stopping training and displaying result in graph (maybe use "tensorboard" or "weights and biases")
  • Find new ways of initializing class and non-class values
    • Uniform init (poor performance)
    • Soft target init
    • Implement initial nudge for network preference init
    • Try taking all the initial preferences and remapping them to use the full (0,1) range
  • Try pushing after each batch

Algorithm Design

  • Refine the σ-adaptation strategy (tuning, edge cases)
  • Explore and prototype additional adaptation methods

Research & Evaluation

  • Define evaluation metrics beyond accuracy/loss (e.g. training speed, confidence margin)
  • Analyze, visualize, and summarize results
  • Draft and outline the research paper
  • Finish paper