Short question about polimago and the leave out test; how reliable is the leave out test as compared to testing with a completely independent set of test images? If I understood the help file correctly, the leave out test actually recalculates the classifier after removing a few images and applies that classifier to the removed images. But doesn’t this effective reduction in training images reduce the overal reliability of the classifier? Of course such an effect would be less pronounced for larger training sets, but is it possible to give somewhat of an indication on how large an effect this is?
Hi @CvK!
You are of course absolutely right: Nothing ever beats a test set in terms of validity and realism (at least as long the test set has been selected applying common sense and randomness). However there definitely are cases where the acquisition of a sufficiently large set of images to not only generate a decent training set but also a sufficiently big test set is either not possible or not economically sane.
In these cases where you need to make ends meet with whatever images you can get your hands on the hold out test or leave out test is an excellent plan B. The idea of this test is simple and you correctly described it:
- If you are training on
N
images, in every cycle you remove a set ofM
images from the learning set, then calculate a classifier on the remaining set ofN-M
images. - The resulting classifier is then used to classify the
N
images that have been removed. - These cycles are repeated until each and every image in the training set has been removed once, gathering the classification results as you go.
Back in the days of Manto this approach was not really an option because the SVM calculation of the classifiers on the reduced training sets would all have had to be carried out from A to Z, rendering the whole test prohibitively long (in fact we once had a software that was doing exactly that; however on a typical training set it would usually work for almost a week). With Polimago, however, things are different because it is possible to re-use a lot of the calculations, bringing the processing time on a leave out test to the range of hours rather than days.
But does this reduction reduce the overall reliability of the classifier? Yes, but only slightly. The key parameter here is the hold out size, which I recommend setting to a value as low as you can afford it (lower values mean longer calculations). At any rate it needs to be significantly smaller than the size of your training set. Assume you are training on 1000 images and set the hold out size to e.g. 5, you remove 0.5% of the training set in each cycle - only a very small fraction of the whole set and as each contributing image has the same weight in Polimago one might argue that the result quality can hardly be that far off (and the capital sin of having your test set influence the classifier you want to test if completely avoided).
Unfortunately I know of no means of calculating just how much of an uncertainty the holdout test has. What I have seen with sets like the MNIST benchmark is that as long as the holdout size is significantly smaller (factor 100+) than the size of class with the least images you can usually trust the holdout test results (meaning that on a test set of comparable size the percentage of classification mistakes might differ by a fraction of a percent, but not more).
Thanks for the expansive answer, exactly what I was looking for! Its always hard to pin it down to an exact number (the correct answer for every engineering question seems to be “it depends…”). But my first classifier was aparantly a bit too small for the leave out test, as the real world test outperformed the leave out test. Naturally, this made me suspicious; a streamlined convienient and somewhat more sloppy test is unlikely to be more sensitive to errors than a robuust real world test.
I’ve increased the sample size to well above a hundred and also reduced the holdout size, that seemed to do the trick. The two different tests now give comparable results, although the real test is slightly better still it seems. The leave out test is much quicker to do though, I think I’ll stick with that one and leave the robuust real world test for a final validation step in the process.
edit: On a completely different track, it seems that there is no polimago tag. Can tags be created by everyone or do you need moderator status for that? I tried to add the tag to this topic but it didn’t work.
There now is a “polimago” tag - I have created it after reading your post and added it to the post. Tag creation requires trust level 3 (“regular”) which will bestowed active participants on the forum automatically over time (just as you progressed from “new” to “basic user” in the past).