Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
8f264082
Commit
8f264082
authored
Mar 13, 2024
by
liukaiwen
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'master' into dev-in-line-bbox
# Conflicts: # magic_pdf/pre_proc/ocr_span_list_modify.py
parents
21cfaf4c
6f7aa890
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
2 deletions
+5
-2
ocr_span_list_modify.py
magic_pdf/pre_proc/ocr_span_list_modify.py
+5
-2
No files found.
magic_pdf/pre_proc/ocr_span_list_modify.py
View file @
8f264082
from
magic_pdf.libs.boxbase
import
calculate_overlap_area_in_bbox1_area_ratio
,
get_minbox_if_overlap_by_ratio
from
magic_pdf.libs.boxbase
import
__is_overlaps_y_exceeds_threshold
from
magic_pdf.libs.boxbase
import
calculate_overlap_area_in_bbox1_area_ratio
,
get_minbox_if_overlap_by_ratio
,
\
__is_overlaps_y_exceeds_threshold
def
remove_overlaps_min_spans
(
spans
):
# 删除重叠spans中较小的那些
...
...
@@ -58,7 +61,7 @@ def modify_y_axis(spans: list, displayed_list: list, text_inline_lines: list):
line_first_y0
=
spans
[
0
][
"bbox"
][
1
]
line_first_y
=
spans
[
0
][
"bbox"
][
3
]
#用于给行间公式搜索
#
用于给行间公式搜索
# text_inline_lines = []
for
span
in
spans
[
1
:]:
# if span.get("content","") == "78.":
...
...
@@ -67,7 +70,7 @@ def modify_y_axis(spans: list, displayed_list: list, text_inline_lines: list):
# image和table类型,同上
if
span
[
'type'
]
in
[
"displayed_equation"
,
"image"
,
"table"
]
or
any
(
s
[
'type'
]
in
[
"displayed_equation"
,
"image"
,
"table"
]
for
s
in
current_line
):
#传入
#
传入
if
span
[
"type"
]
in
[
"displayed_equation"
,
"image"
,
"table"
]:
displayed_list
.
append
(
span
)
# 则开始新行
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment