Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
778b1fb7
Commit
778b1fb7
authored
Apr 23, 2024
by
liukaiwen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
更新了para_split
parent
bb2bf065
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
9 deletions
+13
-9
para_split_v2.py
magic_pdf/para/para_split_v2.py
+13
-9
No files found.
magic_pdf/para/para_split_v2.py
View file @
778b1fb7
...
...
@@ -87,17 +87,21 @@ def __detect_list_lines(lines, new_layout_bboxes, lang):
"""
for
l
in
lines
:
first_char
=
__get_span_text
(
l
[
'spans'
][
0
])[
0
]
layout_left
=
__find_layout_bbox_by_line
(
l
[
'bbox'
],
new_layout_bboxes
)[
0
]
if
l
[
'bbox'
][
0
]
==
layout_left
:
if
first_char
.
isupper
()
or
first_char
.
isdigit
():
line_fea_encode
.
append
(
1
)
else
:
line_fea_encode
.
append
(
4
)
layout
=
__find_layout_bbox_by_line
(
l
[
'bbox'
],
new_layout_bboxes
)
if
not
layout
:
line_fea_encode
.
append
(
0
)
else
:
if
first_char
.
isupper
():
line_fea_encode
.
append
(
2
)
layout_left
=
layout
[
0
]
if
l
[
'bbox'
][
0
]
==
layout_left
:
if
first_char
.
isupper
()
or
first_char
.
isdigit
():
line_fea_encode
.
append
(
1
)
else
:
line_fea_encode
.
append
(
4
)
else
:
line_fea_encode
.
append
(
3
)
if
first_char
.
isupper
():
line_fea_encode
.
append
(
2
)
else
:
line_fea_encode
.
append
(
3
)
# 然后根据编码进行分段, 选出来 1,2,3连续出现至少2次的行,认为是列表。
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment