Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
43581df6
Commit
43581df6
authored
Mar 12, 2024
by
许瑞
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
feat: add remove bbox overlap
parent
61a0c62c
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
43 additions
and
0 deletions
+43
-0
remove_bbox_overlap.py
magic_pdf/pre_proc/remove_bbox_overlap.py
+43
-0
No files found.
magic_pdf/pre_proc/remove_bbox_overlap.py
0 → 100644
View file @
43581df6
from
magic_pdf.libs.boxbase
import
_is_in_or_part_overlap
,
_is_in
def
_remove_overlap_between_bbox
(
spans
):
res
=
[]
for
v
in
spans
:
for
i
in
range
(
len
(
res
)):
if
_is_in
(
res
[
i
][
"bbox"
],
v
[
"bbox"
]):
continue
if
_is_in_or_part_overlap
(
res
[
i
][
"bbox"
],
v
[
"bbox"
]):
ix0
,
iy0
,
ix1
,
iy1
=
res
[
i
][
"bbox"
]
x0
,
y0
,
x1
,
y1
=
v
[
"bbox"
]
diff_x
=
min
(
x1
,
ix1
)
-
max
(
x0
,
ix0
)
diff_y
=
min
(
y1
,
iy1
)
-
max
(
y0
,
iy0
)
if
diff_x
>
diff_y
:
if
x1
>=
ix1
:
mid
=
(
x0
+
ix1
)
//
2
ix1
=
min
(
mid
,
ix1
)
x0
=
max
(
mid
+
1
,
x0
)
else
:
mid
=
(
ix0
+
x1
)
//
2
ix0
=
max
(
mid
+
1
,
ix0
)
x1
=
min
(
mid
,
x1
)
else
:
if
y1
>=
iy1
:
mid
=
(
y0
+
iy1
)
//
2
y0
=
max
(
mid
+
1
,
y0
)
iy1
=
min
(
iy1
,
mid
)
else
:
mid
=
(
iy0
+
y1
)
//
2
y1
=
min
(
y1
,
mid
)
iy0
=
max
(
mid
+
1
,
iy0
)
res
[
i
][
"bbox"
]
=
[
ix0
,
iy0
,
ix1
,
iy1
]
v
[
"bbox"
]
=
[
x0
,
y0
,
x1
,
y1
]
res
.
append
(
v
)
return
res
def
remove_overlap_between_bbox
(
spans
):
return
_remove_overlap_between_bbox
(
spans
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment