Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
2c75a375
Unverified
Commit
2c75a375
authored
Nov 04, 2024
by
Xiaomeng Zhao
Committed by
GitHub
Nov 04, 2024
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #855 from myhloli/add-structeqtable
feat(model): add HTML minification to StructTableModel
parents
dc31c97b
b5117e72
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
14 additions
and
0 deletions
+14
-0
StructTableModel.py
...f/model/pek_sub_modules/structeqtable/StructTableModel.py
+14
-0
No files found.
magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py
View file @
2c75a375
import
re
import
torch
from
struct_eqtable
import
build_model
...
...
@@ -28,4 +30,16 @@ class StructTableModel:
images
,
output_format
=
output_format
)
if
output_format
==
"html"
:
results
=
[
self
.
minify_html
(
html
)
for
html
in
results
]
return
results
def
minify_html
(
self
,
html
):
# 移除多余的空白字符
html
=
re
.
sub
(
r'\s+'
,
' '
,
html
)
# 移除行尾的空白字符
html
=
re
.
sub
(
r'\s*>\s*'
,
'>'
,
html
)
# 移除标签前的空白字符
html
=
re
.
sub
(
r'\s*<\s*'
,
'<'
,
html
)
return
html
.
strip
()
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment