我有一个数字图像,如下所示 .
我使用自适应阈值处理方法将上面的数字分割成数字,并检测轮廓并将边界矩形的高度和重量限制设置为大于15以获得以下分段数字 .
我想在上面的图像中对数字进行分段,以便单独获得每个数字,而不是上面的输出 . 在调整大小到(28,28)之后,该结果可以被馈送到MNIST的CNN以更好地预测特定数字 .So, is there any other neat way of segmenting this number in image into individual digits?
提到的一种方法here建议滑动固定大小的绿色窗口并通过训练神经网络来检测数字 . 那么,这个NN将如何训练来对数字进行分类?这种方法避免了OpenCV方法来分离每个单独的数字,但只是在整个图像上滑动窗口不会有点贵 . 如何在训练时处理正面和负面的例子(我应该创建一个单独的数据集......正面的例子可以是mnist数字但是负面的例子呢 . )?
分割:
img = cv2.imread('Image')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY_INV, 7,10)
thresh = clear_border(thresh)
# find contours in the thresholded image, then initialize the
# list of group locations
clone = np.dstack([gray.copy()] * 3)
groupCnts = cv2.findContours(thresh.copy(), cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
groupCnts = groupCnts[0] if imutils.is_cv2() else groupCnts[1]
groupLocs = []
clone = np.dstack([gray.copy()] * 3)
# loop over the group contours
for (i, c) in enumerate(groupCnts):
# compute the bounding box of the contour
(x, y, w, h) = cv2.boundingRect(c)
# only accept the contour region as a grouping of characters if
# the ROI is sufficiently large
if w >= 15 and h >= 15:
print (i, (x, y, w, h))
cv2.rectangle(clone, (x,y), (x+w, y+h), (255,0,0), 1)
groupLocs.append((x, y, w, h))
滑动窗口:
clf = joblib.load("digits_cls.pkl") #mnist trained classifier
img = cv2.imread('Image', 0)
winW, winH = (22, 40)
cv2.imshow("Window0", img)
cv2.waitKey(1)
blur = cv2.GaussianBlur(img, (5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY,11,2)
thresh = clear_border(thresh)
for (x, y, window) in sliding_window(img, stepSize=10, windowSize=(winW, winH)):
if (window.shape[0] != winH or window.shape[1] != winW):
continue
clone = img.copy()
roi = thresh[y:y+winH, x:x+winW]
roi = cv2.resize(roi, (28, 28), interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
cv2.imshow("Window1", roi)
cv2.waitKey(1)
roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
print (nbr)
# since we do not have a classifier, we'll just draw the window
clone = img.copy()
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
cv2.imshow("Window2", clone)
cv2.waitKey(1)
time.sleep(0.95)
奇怪的输出(即使是它预测的空白窗口): 522637753787357777722
分隔连接数字:
h,w = img.shape[:2]
count = 0
iw = 15
dw = w
sw, sh = int(0), int(0)
while (dw > 0):
new_img = img[:, sw:(count+1)*iw]
dw = dw - iw
sw = sw + iw
if (dw-iw < 0):
iw = w
new = os.path.join('amount/', 'amount_'+ str(count)+'.png')
cv2.imwrite(new, new_img)
输出:
找到了一种分离这些连接数字并将它们提供给mnist训练分类器的方法,输出仍然不准确 .
我使用的步骤:
(i)提取第一张图像
(ii)将第一图像分割成单独的图像,即获得第二图像 .
(iii)查看图像宽度是否超过某个阈值,如果是,则将其进一步分割以产生单独的数字(如果连接数字如上)
(iv)将步骤3之后获得的所有单独数字馈送到mnist分类器,以基于重新成形的图像获得数字的预测 .Lengthy right?
Is there any other efficient way to convert first image to digits directly (yes I used pytesseract too!!)?
1 回答
如果您有足够的时间和资源,培训新的神经网络将是一个优雅的解决方案 .
要单独分隔每个数字,您可以尝试反转图像的强度,使手写为白色,背景为黑色 . 然后水平投影值(水平对所有像素值求和)并查找峰值 . 每个峰值位置都应指明一个新的角色位置 .
投影图上的额外平滑功能应优化角色位置 .