Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
a0135640
Commit
a0135640
authored
Mar 15, 2024
by
赵小蒙
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修复spans为空list导致的IndexError: list index out of range
parent
f10b4a50
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
89 additions
and
83 deletions
+89
-83
ocr_dict_merge.py
magic_pdf/pre_proc/ocr_dict_merge.py
+29
-26
ocr_span_list_modify.py
magic_pdf/pre_proc/ocr_span_list_modify.py
+60
-57
No files found.
magic_pdf/pre_proc/ocr_dict_merge.py
View file @
a0135640
...
@@ -24,6 +24,9 @@ def line_sort_spans_by_left_to_right(lines):
...
@@ -24,6 +24,9 @@ def line_sort_spans_by_left_to_right(lines):
return
line_objects
return
line_objects
def
merge_spans_to_line
(
spans
):
def
merge_spans_to_line
(
spans
):
if
len
(
spans
)
==
0
:
return
[]
else
:
# 按照y0坐标排序
# 按照y0坐标排序
spans
.
sort
(
key
=
lambda
span
:
span
[
'bbox'
][
1
])
spans
.
sort
(
key
=
lambda
span
:
span
[
'bbox'
][
1
])
...
...
magic_pdf/pre_proc/ocr_span_list_modify.py
View file @
a0135640
...
@@ -77,7 +77,10 @@ def adjust_bbox_for_standalone_block(spans):
...
@@ -77,7 +77,10 @@ def adjust_bbox_for_standalone_block(spans):
def
modify_y_axis
(
spans
:
list
,
displayed_list
:
list
,
text_inline_lines
:
list
):
def
modify_y_axis
(
spans
:
list
,
displayed_list
:
list
,
text_inline_lines
:
list
):
# displayed_list = []
# displayed_list = []
# 如果spans为空,则不处理
if
len
(
spans
)
==
0
:
pass
else
:
spans
.
sort
(
key
=
lambda
span
:
span
[
'bbox'
][
1
])
spans
.
sort
(
key
=
lambda
span
:
span
[
'bbox'
][
1
])
lines
=
[]
lines
=
[]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment