Quality grading polimago

In the past I’ve only used polimago for classification purposes, wholly disregarding the option to have a continious grading. I was wondering how sensitive the provided measurement points are, and how strict the requirements are.

This sounds a bit vague, so a quick example should clear things up. Take for example large set of products, which due to their production process can be perfect (score of 1) or absolutely abysmal (score of 0). In the deterministic case, if I later want to change the cutoff point, I would have to relabel my data and create a new model. But, if I can somehow give each product a grading, and have polimago asses each new product and provide me with a grading that corresponds to this grading, I can later change the cutoff point.

Now the real question is, how “smooth” does the assessment need to be? There is likely to be quite a lot of noise on the rather subjective quality measurement. Is this something I can overcome by sampling it a lot? Or is it better to have individual people judge the same samples and average their scores before feeding it to polimago?

1 Like

That’s actually a fairly tricky one…

Maybe let’s start at the end of your question: I think that even with the smoothest of estimates you’d have to expect some degree of variation on the output values. Assume you can unambiguously group your images into three quality groups and assign labels to them (0.0, 0.5, and 1.0) without any trace of ambiguity (a situation that I will yet have to encounter…), then you run all your images by the resulting regression predictor (this is effectively what a leave out test does…), the output values will not exclusively be 0.0, 0.5, and 1.0. We’re dealing with a statistical methods on input data that is subject to things like noise, distortion etc. - all that will propagate into your results. So the best you can expect in a sufficiently benign scenario are results like 0.01, -0.09, 0.45, 1.1, … (yes, you read correctly - the results are not clamped to the range of input values that you provided…).

As might be expected this will get worse if there is inherent ambiguity in the regression task. Imagine an image that when given to five different human operators will yield five different responses - a situation that is not exactly unlikely in practice. This will add to the variance of your input data and to that of the regression predictor’s output (so instead of the benign case before, where the outputs were still reasonably close to what you specified for the input images, you might now end up with results like 0.15, 0.3, 0.74, …).

Ultimately, this inherent variance/ambiguity on the input data for classifier training will always be reflected on the performance of the resulting predictor - not just in the regression case but also with classification and search predictors. Working with “smooth” assessments as you pointed out will of course help in that it will narrow down the variance involved, but it’s not realistically possible to bring it down to zero as long as you are not working on entirely artificial data. More data will help up to a point, but the point at which this saturates depends on the quality of the input data.

This will be detectable e.g. in the leave out test for regression training sets. A benign situation would look similar to the first graph, while one with a higher variance would be rather closer to the second one (pictures courtesy of @Phil):
image
image
(in both graphs the x axis corresponds to the grading of the input data - which is why there are only three discrete steps - whereas the y axis corresponds to the output of the resulting regression predictor for the very same input data; both graphs have been generated on artificial data with variances selected to illustrate my point)

Having said that: Regression predictors can perform really well (if they did not, we would not have reliable search predictors because effectively the search predictor is nothing else but a variably-sized mob of regression predictors). But it’s generally important to keep in mind that the regression predictors are - like most other things - subject to the garbage-in-garbage-out principle (or, in this case, rather variance-in-variance-out).

From what I have seen in the past, I would recommend the following:

  • If you have input data with…

    • … a substantial number (let’s say 5 or more) of discernible quality stages (i.e. not just “0” and “1”)
    • … a reasonably small ambiguity in judgement of these quality stages

    then a regression predictor might be a viable approach for what you want to do.

  • If any one these prerequisites is not given, I would actually recommend working with a classification predictor instead, because it will give you more data to work with: You’ll get as an output the confidences for the grouping into any of your quality groups (whereas regression would give you exactly one value and no indication of confidence). If the latter comes as a surprise, remember that the regression predictor stands alone for each dimension/degree of freedom for which you trained one, while the classification predictor effectively has one predictor available for each combination of two classes in your training data.

4 Likes

Ah, lovely! I keep pressing the like button but I can only give you one… :heart:

This matches my gut feeling, so good to see my intuition wasn’t completely off mark. No human based grading is of course perfect, but it’s (while being aware of the inherent dangers) still an option if one is carefull about it.

The rule of thumbs regarding the number of classes also help a lot. I dreaded having to find that out myself the hard way! Could you even do, by lack of a better word, subpixel accuracy? Having 5 classes (1 to 5) and using the confidences, is it reasonable to use this to predict something belonging to class 4.5 if the confidences for class 4 and 5 are identical?

Could be I am thoroughly abusing the algorithm, but i’ve found that poking the boundary conditions of various approaches generaly results in a good understanding of the inner workings of said approach.

1 Like

stop it! you’re making me blush! :blush:

Generally, if you are looking at an image for which the classification yields the same confidence values for two different classes A and B, the image may be considered to look like “A” just as much as it looks like “B”. I have not yet experimented with image sets that are suitable to either confirm or disprove your speculation and will therefore do neither of them at this point :slightly_smiling_face:

I don’t think Polimago will be able to do your subclassification. While I see where your idea is coming from and It might work for a 1D regression it will most likely fail in higher dimension. The neighborhood information you are looking at and want to use for sublevel classification will not exist in the feature space. If you find an image is likely A and B there might not be a A.5 class. I will give you a short example to try to illustrate my point.

Recognizing hand written digits can be done using Polimago. Take a look at the MNIST data, a snippit of the data is shown below.
mnist

We can view this example as a regression that we devide into 10 classes (0-9). A typicall error, depending on how well the model is trained is mixing 7 and 9. These two groups contain examples that a very close to the other, i.e. a 9 is 70% a 9 and 65% a 7. If we now use your subclass we would end up the number is 8, which is a different class altogether. So 7 and 9 are not necessary neigbours in your input space but are in the feature space. Since the feature space is much larger there will be more than one neighbour and the classes might not even be convex sets, nor alone closed. We could have to ellipse that contain 7s that don’t overlap.

In short: Your idea sounds reasonable for a certain example, like the one in @illusive pcitures, but will most likely fail for larger dimension, e.g. images as inputs.

2 Likes

Ah, no, of course if the images represent classes then doing interpolation makes very little sense, as there is no 7.5 in the case of detecting if something is a specific number.

But say for example you have a quality grading; some products have quality 0, some quality .5 and some 1, in this case there is good cause to expect that there is something that can be judged with .25 even though I never trained any class to be labeled .25

This was a random idea from me, so if I come acros a nice dataset I will keep it in mid and give it a shot and post the results here. The proof of the pudding is in the eating after all!