Going back to the Signal Detection paradigm, the paradigm suggests that as a rule increasing the number of data points will reduce the false positives (alpha). And reducing false positives was a major objective of this research. Frankly for a time I was flummoxed. Suddenly I realized that I was looking at the problem incorrectly. I realized that the problem is with the resolution or granularity of the measurement.
The Signal Detection paradigm has as a fundamental assumption the concept of a defined event or event window - and detecting whether or not within that event window a signal is present. The increased sampling rate compounded error, particularly false positive errors. In effect, the system would take two samples, within the conditions that set-off the false positive. Thus producing more than one false positive within an event window where only one false positive should have been recorded.
How to overcome the problem of oversampling, of setting the wrong size event window? Here are some things that come to mind:
- First, recognizing that there's an event-window problem may be the most difficult. This particular situation suggested an event-window problem because the results were counter to expectations. Having primarily a theoretical perspective, I am not the best one to address this issue.
- Finding event windows may involve a tuning or "dialing-in" process. However it is done, it may take many samples at various sampling resolutions to determine the best or acceptable level of resolution.
- Consider adding a waiting period once a signal has been detected. The hope is that the waiting period will reduce the chances of making a false positive error.
I think you need to think in terms of:
ReplyDelete- sample rate (how often you make a measurement)
- resolution (for a given measurement how much resolution you get)
- accuracy (how accurate are your measurements)
- precession (how repeatable are your measurements)
Then it goes into how accurate are your calculations ...
It does not really make sense to me that more data is bad unless there is a problem in precisison or accuracy or in the algorithm that evaluates the data.