Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
d3d627cd
Unverified
Commit
d3d627cd
authored
Oct 25, 2024
by
Xiaomeng Zhao
Committed by
GitHub
Oct 25, 2024
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #786 from myhloli/fix-imgs-block
refactor(ocr): adjust OCR processing parameters
parents
25a6d4ba
1807126e
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
2 additions
and
2 deletions
+2
-2
pdf_extract_kit.py
magic_pdf/model/pdf_extract_kit.py
+1
-1
ocr_dict_merge.py
magic_pdf/pre_proc/ocr_dict_merge.py
+1
-1
No files found.
magic_pdf/model/pdf_extract_kit.py
View file @
d3d627cd
...
@@ -83,7 +83,7 @@ def doclayout_yolo_model_init(weight):
...
@@ -83,7 +83,7 @@ def doclayout_yolo_model_init(weight):
return
model
return
model
def
ocr_model_init
(
show_log
:
bool
=
False
,
det_db_box_thresh
=
0.3
,
lang
=
None
,
use_dilation
=
True
,
det_db_unclip_ratio
=
2.4
):
def
ocr_model_init
(
show_log
:
bool
=
False
,
det_db_box_thresh
=
0.3
,
lang
=
None
,
use_dilation
=
True
,
det_db_unclip_ratio
=
1.8
):
if
lang
is
not
None
:
if
lang
is
not
None
:
model
=
ModifiedPaddleOCR
(
show_log
=
show_log
,
det_db_box_thresh
=
det_db_box_thresh
,
lang
=
lang
,
use_dilation
=
use_dilation
,
det_db_unclip_ratio
=
det_db_unclip_ratio
)
model
=
ModifiedPaddleOCR
(
show_log
=
show_log
,
det_db_box_thresh
=
det_db_box_thresh
,
lang
=
lang
,
use_dilation
=
use_dilation
,
det_db_unclip_ratio
=
det_db_unclip_ratio
)
else
:
else
:
...
...
magic_pdf/pre_proc/ocr_dict_merge.py
View file @
d3d627cd
...
@@ -49,7 +49,7 @@ def merge_spans_to_line(spans):
...
@@ -49,7 +49,7 @@ def merge_spans_to_line(spans):
continue
continue
# 如果当前的span与当前行的最后一个span在y轴上重叠,则添加到当前行
# 如果当前的span与当前行的最后一个span在y轴上重叠,则添加到当前行
if
__is_overlaps_y_exceeds_threshold
(
span
[
'bbox'
],
current_line
[
-
1
][
'bbox'
],
0.
6
):
if
__is_overlaps_y_exceeds_threshold
(
span
[
'bbox'
],
current_line
[
-
1
][
'bbox'
],
0.
5
):
current_line
.
append
(
span
)
current_line
.
append
(
span
)
else
:
else
:
# 否则,开始新行
# 否则,开始新行
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment