That’s actually a fairly tricky one…
Maybe let’s start at the end of your question: I think that even with the smoothest of estimates you’d have to expect some degree of variation on the output values. Assume you can unambiguously group your images into three quality groups and assign labels to them (0.0, 0.5, and 1.0) without any trace of ambiguity (a situation that I will yet have to encounter…), then you run all your images by the resulting regression predictor (this is effectively what a leave out test does…), the output values will not exclusively be 0.0, 0.5, and 1.0. We’re dealing with a statistical methods on input data that is subject to things like noise, distortion etc.  all that will propagate into your results. So the best you can expect in a sufficiently benign scenario are results like 0.01, 0.09, 0.45, 1.1, … (yes, you read correctly  the results are not clamped to the range of input values that you provided…).
As might be expected this will get worse if there is inherent ambiguity in the regression task. Imagine an image that when given to five different human operators will yield five different responses  a situation that is not exactly unlikely in practice. This will add to the variance of your input data and to that of the regression predictor’s output (so instead of the benign case before, where the outputs were still reasonably close to what you specified for the input images, you might now end up with results like 0.15, 0.3, 0.74, …).
Ultimately, this inherent variance/ambiguity on the input data for classifier training will always be reflected on the performance of the resulting predictor  not just in the regression case but also with classification and search predictors. Working with “smooth” assessments as you pointed out will of course help in that it will narrow down the variance involved, but it’s not realistically possible to bring it down to zero as long as you are not working on entirely artificial data. More data will help up to a point, but the point at which this saturates depends on the quality of the input data.
This will be detectable e.g. in the leave out test for regression training sets. A benign situation would look similar to the first graph, while one with a higher variance would be rather closer to the second one (pictures courtesy of @Phil):
(in both graphs the x axis corresponds to the grading of the input data  which is why there are only three discrete steps  whereas the y axis corresponds to the output of the resulting regression predictor for the very same input data; both graphs have been generated on artificial data with variances selected to illustrate my point)
Having said that: Regression predictors can perform really well (if they did not, we would not have reliable search predictors because effectively the search predictor is nothing else but a variablysized mob of regression predictors). But it’s generally important to keep in mind that the regression predictors are  like most other things  subject to the garbageingarbageout principle (or, in this case, rather varianceinvarianceout).
From what I have seen in the past, I would recommend the following:

If you have input data with…
 … a substantial number (let’s say 5 or more) of discernible quality stages (i.e. not just “0” and “1”)
 … a reasonably small ambiguity in judgement of these quality stages
then a regression predictor might be a viable approach for what you want to do.

If any one these prerequisites is not given, I would actually recommend working with a classification predictor instead, because it will give you more data to work with: You’ll get as an output the confidences for the grouping into any of your quality groups (whereas regression would give you exactly one value and no indication of confidence). If the latter comes as a surprise, remember that the regression predictor stands alone for each dimension/degree of freedom for which you trained one, while the classification predictor effectively has one predictor available for each combination of two classes in your training data.