首页 文章

不均匀间隔数字的字符分割与识别

提问于
浏览
1

我有一个数字图像,如下所示 .

Image

我使用自适应阈值处理方法将上面的数字分割成数字,并检测轮廓并将边界矩形的高度和重量限制设置为大于15以获得以下分段数字 .

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

我想在上面的图像中对数字进行分段,以便单独获得每个数字,而不是上面的输出 . 在调整大小到(28,28)之后,该结果可以被馈送到MNIST的CNN以更好地预测特定数字 .
So, is there any other neat way of segmenting this number in image into individual digits?

提到的一种方法here建议滑动固定大小的绿色窗口并通过训练神经网络来检测数字 . 那么,这个NN将如何训练来对数字进行分类?这种方法避免了OpenCV方法来分离每个单独的数字,但只是在整个图像上滑动窗口不会有点贵 . 如何在训练时处理正面和负面的例子(我应该创建一个单独的数据集......正面的例子可以是mnist数字但是负面的例子呢 . )?

分割:

img = cv2.imread('Image')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

blur = cv2.GaussianBlur(gray,(3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
            cv2.THRESH_BINARY_INV, 7,10)
thresh = clear_border(thresh)

# find contours in the thresholded image, then initialize the
# list of group locations
clone = np.dstack([gray.copy()] * 3)
groupCnts = cv2.findContours(thresh.copy(), cv2.RETR_TREE,
    cv2.CHAIN_APPROX_SIMPLE)
groupCnts = groupCnts[0] if imutils.is_cv2() else groupCnts[1]
groupLocs = []

clone = np.dstack([gray.copy()] * 3)
# loop over the group contours
for (i, c) in enumerate(groupCnts):
    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)
    # only accept the contour region as a grouping of characters if
    # the ROI is sufficiently large
    if w >= 15 and h >= 15:
        print (i, (x, y, w, h))
        cv2.rectangle(clone, (x,y), (x+w, y+h), (255,0,0), 1)
        groupLocs.append((x, y, w, h))

滑动窗口:

clf = joblib.load("digits_cls.pkl")    #mnist trained classifier
img = cv2.imread('Image', 0)
winW, winH = (22, 40)
cv2.imshow("Window0", img)
cv2.waitKey(1)

blur = cv2.GaussianBlur(img, (5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv2.THRESH_BINARY,11,2)
thresh = clear_border(thresh) 

for (x, y, window) in sliding_window(img, stepSize=10, windowSize=(winW, winH)):
    if (window.shape[0] != winH or window.shape[1] != winW):
        continue
    clone = img.copy()
    roi = thresh[y:y+winH, x:x+winW]
    roi = cv2.resize(roi, (28, 28), interpolation=cv2.INTER_AREA)
    roi = cv2.dilate(roi, (3, 3))
    cv2.imshow("Window1", roi)
    cv2.waitKey(1)
    roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
    nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
    print (nbr)

    # since we do not have a classifier, we'll just draw the window
    clone = img.copy()
    cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
    cv2.imshow("Window2", clone)
    cv2.waitKey(1)
    time.sleep(0.95)

奇怪的输出(即使是它预测的空白窗口): 522637753787357777722

分隔连接数字:

h,w = img.shape[:2]
 count = 0
 iw = 15
 dw = w
 sw, sh = int(0), int(0)
 while (dw > 0):
    new_img = img[:, sw:(count+1)*iw]
    dw = dw - iw
    sw = sw + iw
    if (dw-iw < 0):
        iw = w
    new = os.path.join('amount/', 'amount_'+ str(count)+'.png')
    cv2.imwrite(new, new_img)

输出:

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

找到了一种分离这些连接数字并将它们提供给mnist训练分类器的方法,输出仍然不准确 .

我使用的步骤:
(i)提取第一张图像
(ii)将第一图像分割成单独的图像,即获得第二图像 .
(iii)查看图像宽度是否超过某个阈值,如果是,则将其进一步分割以产生单独的数字(如果连接数字如上)
(iv)将步骤3之后获得的所有单独数字馈送到mnist分类器,以基于重新成形的图像获得数字的预测 .
Lengthy right?
Is there any other efficient way to convert first image to digits directly (yes I used pytesseract too!!)?

1 回答

  • 2

    如果您有足够的时间和资源,培训新的神经网络将是一个优雅的解决方案 .

    要单独分隔每个数字,您可以尝试反转图像的强度,使手写为白色,背景为黑色 . 然后水平投影值(水平对所有像素值求和)并查找峰值 . 每个峰值位置都应指明一个新的角色位置 .

    投影图上的额外平滑功能应优化角色位置 .

相关问题