Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
d67af17f
Commit
d67af17f
authored
Jul 13, 2024
by
quyuan
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
add ci
parent
7560e128
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
1 addition
and
4 deletions
+1
-4
calculate_score.py
tests/test_cli/lib/calculate_score.py
+1
-1
pre_clean.py
tests/test_cli/lib/pre_clean.py
+0
-3
No files found.
tests/test_cli/lib/calculate_score.py
View file @
d67af17f
...
@@ -4,12 +4,12 @@ calculate_score
...
@@ -4,12 +4,12 @@ calculate_score
import
os
import
os
import
re
import
re
import
json
import
json
from
Levenshtein
import
distance
from
lib
import
scoring
from
lib
import
scoring
from
nltk.translate.bleu_score
import
sentence_bleu
,
SmoothingFunction
from
nltk.translate.bleu_score
import
sentence_bleu
,
SmoothingFunction
from
nltk.tokenize
import
word_tokenize
from
nltk.tokenize
import
word_tokenize
import
nltk
import
nltk
nltk
.
download
(
'punkt'
)
nltk
.
download
(
'punkt'
)
from
Levenshtein
import
distance
class
Scoring
:
class
Scoring
:
"""
"""
...
...
tests/test_cli/lib/pre_clean.py
View file @
d67af17f
...
@@ -118,9 +118,6 @@ def clean_data(prod_type, download_dir):
...
@@ -118,9 +118,6 @@ def clean_data(prod_type, download_dir):
with
open
(
input_file
,
'r'
,
encoding
=
'utf-8'
)
as
fr
:
with
open
(
input_file
,
'r'
,
encoding
=
'utf-8'
)
as
fr
:
content
=
fr
.
read
()
content
=
fr
.
read
()
new_content
=
clean_markdown_images
(
content
)
new_content
=
clean_markdown_images
(
content
)
new_content
=
convert_html_table_to_md
(
new_content
)
new_content
=
convert_latext_to_md
(
new_content
)
new_content
=
convert_htmltale_to_md
(
new_content
)
with
open
(
output_file
,
'w'
,
encoding
=
'utf-8'
)
as
fw
:
with
open
(
output_file
,
'w'
,
encoding
=
'utf-8'
)
as
fw
:
fw
.
write
(
new_content
)
fw
.
write
(
new_content
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment