Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
2fb4b2ef
Commit
2fb4b2ef
authored
Mar 20, 2024
by
liusilu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
add pdf tools
parent
d3e6853a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
4 deletions
+11
-4
overall_indicator.py
tests/overall_indicator.py
+11
-4
No files found.
tests/overall_indicator.py
View file @
2fb4b2ef
...
...
@@ -22,9 +22,9 @@ def indicator_cal(json_standard,json_test):
'''数据集总体指标'''
a
=
json_test
[[
'id'
,
'mid_json'
]]
b
=
json_standard
[[
'id'
,
'mid_json'
]]
b
=
json_standard
[[
'id'
,
'mid_json'
,
'pass_label'
]]
outer_merge
=
pd
.
merge
(
a
,
b
,
on
=
'id'
,
how
=
'outer'
)
outer_merge
.
columns
=
[
'id'
,
'standard_mid_json'
,
'test_mid_json'
]
outer_merge
.
columns
=
[
'id'
,
'standard_mid_json'
,
'test_mid_json'
,
'pass_label'
]
standard_exist
=
outer_merge
.
standard_mid_json
.
apply
(
lambda
x
:
not
isnull
(
x
))
test_exist
=
outer_merge
.
test_mid_json
.
apply
(
lambda
x
:
not
isnull
(
x
))
...
...
@@ -36,7 +36,7 @@ def indicator_cal(json_standard,json_test):
inner_merge
=
pd
.
merge
(
a
,
b
,
on
=
'id'
,
how
=
'inner'
)
inner_merge
.
columns
=
[
'id'
,
'standard_mid_json'
,
'test_mid_json'
]
inner_merge
.
columns
=
[
'id'
,
'standard_mid_json'
,
'test_mid_json'
,
'pass_label'
]
json_standard
=
inner_merge
[
'standard_mid_json'
]
#check一下是否对齐
json_test
=
inner_merge
[
'test_mid_json'
]
...
...
@@ -156,7 +156,14 @@ def indicator_cal(json_standard,json_test):
"""
'''计算pdf之间的总体编辑距离和bleu'''
'''
计算pdf之间的总体编辑距离和bleu
这里只计算正例的pdf
'''
test_para_text
=
np
.
asarray
(
test_para_text
,
dtype
=
object
)[
inner_merge
[
'pass_label'
]
==
'yes'
]
standard_para_text
=
np
.
asarray
(
standard_para_text
,
dtype
=
object
)[
inner_merge
[
'pass_label'
]
==
'yes'
]
pdf_dis
=
[]
pdf_bleu
=
[]
for
a
,
b
in
zip
(
test_para_text
,
standard_para_text
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment