Unverified Commit 041b9465 authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub

fix(pdf-extract): adjust box threshold for OCR detection (#447)

Tuned the detection box threshold parameter in the OCR model initialization to improve the
accuracy of text extraction from images. The threshold was modified from 0.6 to
0.3 to filter out smaller detection boxes, which is expected to enhance the quality of the extracted
text by reducing noise and false positives in the OCR process.
parent 3da5c411
......@@ -139,7 +139,7 @@ class CustomPEKModel:
)
# 初始化ocr
if self.apply_ocr:
self.ocr_model = ModifiedPaddleOCR(show_log=show_log)
self.ocr_model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=0.3)
# init structeqtable
if self.apply_table:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment