Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
6a3d1f2d
Commit
6a3d1f2d
authored
Apr 28, 2024
by
许瑞
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
feat: update remove overlap
parent
96d17cb0
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
14 additions
and
5 deletions
+14
-5
magic_model.py
magic_pdf/model/magic_model.py
+0
-3
remove_bbox_overlap.py
magic_pdf/pre_proc/remove_bbox_overlap.py
+14
-2
No files found.
magic_pdf/model/magic_model.py
View file @
6a3d1f2d
...
@@ -461,9 +461,6 @@ class MagicModel:
...
@@ -461,9 +461,6 @@ class MagicModel:
blocks
.
append
(
block
)
blocks
.
append
(
block
)
return
blocks
return
blocks
def
get_model_list
(
self
,
page_no
):
return
self
.
__model_list
[
page_no
]
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
drw
=
DiskReaderWriter
(
r"D:/project/20231108code-clean"
)
drw
=
DiskReaderWriter
(
r"D:/project/20231108code-clean"
)
...
...
magic_pdf/pre_proc/remove_bbox_overlap.py
View file @
6a3d1f2d
...
@@ -3,9 +3,21 @@ from magic_pdf.libs.boxbase import _is_in_or_part_overlap, _is_in
...
@@ -3,9 +3,21 @@ from magic_pdf.libs.boxbase import _is_in_or_part_overlap, _is_in
def
_remove_overlap_between_bbox
(
spans
):
def
_remove_overlap_between_bbox
(
spans
):
res
=
[]
res
=
[]
for
v
in
spans
:
keeps
=
[
True
]
*
len
(
spans
)
for
i
in
range
(
len
(
spans
)):
for
j
in
range
(
len
(
spans
)):
if
i
==
j
:
continue
if
_is_in
(
spans
[
i
][
"bbox"
],
spans
[
j
][
"bbox"
]):
keeps
[
i
]
=
False
for
idx
,
v
in
enumerate
(
spans
):
if
not
keeps
[
idx
]:
continue
for
i
in
range
(
len
(
res
)):
for
i
in
range
(
len
(
res
)):
if
_is_in
(
res
[
i
][
"bbox"
],
v
[
"bbox"
])
or
_is_in
(
v
[
"bbox"
],
res
[
i
][
"bbox"
]):
if
_is_in
(
v
[
"bbox"
],
res
[
i
][
"bbox"
]):
continue
continue
if
_is_in_or_part_overlap
(
res
[
i
][
"bbox"
],
v
[
"bbox"
]):
if
_is_in_or_part_overlap
(
res
[
i
][
"bbox"
],
v
[
"bbox"
]):
ix0
,
iy0
,
ix1
,
iy1
=
res
[
i
][
"bbox"
]
ix0
,
iy0
,
ix1
,
iy1
=
res
[
i
][
"bbox"
]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment