Polimago Classification & Regression's project to classify the deftect detection

Hi sir or madam::

My project is to do defect detection, so I create Polimago Classification & Regression’s project to classify the deftect detection .

I divide two classes. One is Pass samples the amount is 300. the other is Failed samples the amount is 1200.

I create classifier and do sample test, the result is below. The Fail’s class has 16 erros and Pass’s class has 8 errors.

May I ask how can I do or adjust the parameters that the both Errors can be zero.

If you have any way to solve it, tell me. thank you.

1

2

First, you might want to get rid of the inbalance of the training set. Having a balanced training set (e.g. all classes have the same number of samples) should always be the goal. Otherwise you create a bias for the class that has the most (or more) samples. That would distort the classifiers performance and make your evaluation less accurate. In your case, you create a bias toward the Failed class; You’ve provided much more samples for that class and thus the classifier “knows” this class much better. This is usually not desirable.

Your screenshot does not reflect the 300/1200 split of your classes. The screenshot states a 1496/391 split. Where stem the additional images from?

In regard to training parameters, it is hard (or impossible) to say what will improve your results without having seen the actual training set images. But, you could start with only “a” in your pre-processing code for max accuracy and max details (again, this also depends on the images/use case) and toggle the “Interpolate”. You could also increase the feature resolution.

Also, you already have an accuracy of about 98% (in regard to the sum of 1887 samples, and 37 errors) what is already really good for any given use case. I would not expect much improvement beyond that. You will never reach 100% in practice.

2 Likes

Dear Frank:

My screenshot pass amount 391 = 300(training set images) + 91(test images),
and failed amount 1496 = 1200(training set images) + 296(test images).

Thanks for giving some suggestions. I will follow your suggestion to try it again.

One thing I’d like to point out here as well is that zero errors are often not a realistic goal - on a test set as well as the training set (we’re looking at a cross validation here, so we don’t really need to make that distinction). Polimago uses statistical methods and much depends on the actual content of the images. If there are some borderline cases (i.e. pictures that are hard to assign to either of the two classes) then it’s actually quite normal to have non-zero values on the error column and the trick is to actually make sure they are low enough to be usable in a real-life application. 37 errors in a set of 1887 images is - as @Frank pointed out - actually not bad (almost 2% error rate) - I’ve seen real-world applications that were ok with more than twice that. At the end of the day it’s a question of the actual images and the application’s requirements.