RSNA 2003 Scientific Papers > Avoiding Overfitting and Increasing Generalizability ...
  Scientific Papers
  SESSION: Physics (Image Processing: CAD I--Breast)

Avoiding Overfitting and Increasing Generalizability of Artificial Neural Networks in CAD by Training with Jitter

  DATE: Monday, December 01 2003
  START TIME: 11:50 AM
  END TIME: 11:57 AM
  CODE: C18-381

Richard Zur
Chicago , IL
Yulei Jiang PhD

Computers, diagnostic aid
Computers, neural network
Computers, simulation

Purpose: To determine whether adding jitter to artificial neural networks (ANNs) trained on a mammography database can, in effect, eliminate jitter and increase generalizability of the classifier.

Methods and Materials: "Jitter" refers to a random vector added to input data in between ANN training iterations. The rationale for continually moving the input values during training in this way was to increase the generalizability of the ANNs by better approximation of the underlying distribution. We preformed simulation studies on data drawn from independent bivariate normal distributions with an ideal observer Az value of 0.84. We trained on 20 cases (10 abnormal) and tested on 2000 cases (1000 abnormal). Groups of feed-forward ANNs with 2 input nodes, 2 hidden nodes and a single output node were trained using gradient descent and error back-propagation. In addition, a mammography database of 200 images was arbitrarily jackknifed (by patient) multiple times into training sets of about 41 images (19 abnormal) and test sets of 159 images (69 abnormal). Groups of feed-forward ANNs with 8 input nodes, 10 hidden nodes and a single output node were trained as before. Jitter was drawn from a normal distribution. We investigated using many different values of variance for the distribution to find the optimal value. ANN output was analyzed using receiver operating characteristic (ROC) curve analysis.

Results: Our simulation studies showed it was possible to achieve the ideal observer Az value by training with jitter. Without jitter we were able to achieve an average Az value of 0.78 with some indication of overfitting. ANNs trained with jitter approached an Az value of 0.84, with some fluctuation present. For the mammography dataset, ANNs trained with no jitter reached, for different jackknifed data sets, maximum Az values ranging from 0.81 to 0.85. After many iterations the Az decreased significantly, indicating overfitting. Adding jitter resulted in ANNs that were able to attain maximum Az values that ranged from 0.87 to 0.90 (p < 0.035).

Conclusion: Adding jitter to ANN training data in between iterations can prevent overfitting while creating ANNs that generalize better. This technique may be useful to obtain classifiers in situations where the amount of training data is limited.