Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
6f58eeab
Commit
6f58eeab
authored
Aug 28, 2024
by
drunkpig
Browse files
Options
Browse Files
Download
Plain Diff
merge: sync from master branch
parents
9067cd31
7f0fe200
Changes
7
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
362 additions
and
237 deletions
+362
-237
Dockerfile
Dockerfile
+1
-1
README_zh-CN.md.bak
README_zh-CN.md.bak
+287
-185
download_models.py
docs/download_models.py
+4
-0
how_to_download_models_en.md
docs/how_to_download_models_en.md
+1
-1
how_to_download_models_zh_cn.md
docs/how_to_download_models_zh_cn.md
+3
-3
para_split_v2.py
magic_pdf/para/para_split_v2.py
+50
-47
cla.json
signatures/version1/cla.json
+16
-0
No files found.
Dockerfile
View file @
6f58eeab
README_zh-CN.md.bak
View file @
6f58eeab
This diff is collapsed.
Click to expand it.
docs/download_models.py
0 → 100644
View file @
6f58eeab
# use modelscope sdk download models
from
modelscope
import
snapshot_download
model_dir
=
snapshot_download
(
'opendatalab/PDF-Extract-Kit'
)
print
(
f
"model dir is: {model_dir}/models"
)
docs/how_to_download_models_en.md
View file @
6f58eeab
...
...
@@ -9,7 +9,7 @@ git lfs install
To download the
`PDF-Extract-Kit`
model from Hugging Face, use the following command:
```
bash
git lfs clone https://huggingface.co/
wanderkid
/PDF-Extract-Kit
git lfs clone https://huggingface.co/
opendatalab
/PDF-Extract-Kit
```
Ensure that Git LFS is enabled during the clone to properly download all large files.
...
...
docs/how_to_download_models_zh_cn.md
View file @
6f58eeab
...
...
@@ -13,7 +13,7 @@
```
bash
git lfs
install
# 安装 Git 大文件存储插件 (Git LFS)
git lfs clone https://huggingface.co/
wanderkid
/PDF-Extract-Kit
# 从 Hugging Face 下载 PDF-Extract-Kit 模型
git lfs clone https://huggingface.co/
opendatalab
/PDF-Extract-Kit
# 从 Hugging Face 下载 PDF-Extract-Kit 模型
```
...
...
@@ -28,7 +28,7 @@ ModelScope 支持SDK或模型下载,任选一个即可。
```
bash
git lfs
install
git lfs clone https://www.modelscope.cn/
wanderkid
/PDF-Extract-Kit.git
git lfs clone https://www.modelscope.cn/
opendatalab
/PDF-Extract-Kit.git
```
### 2)利用SDK下载
...
...
@@ -41,7 +41,7 @@ pip install modelscope
```
python
# 使用modelscope sdk下载模型
from
modelscope
import
snapshot_download
model_dir
=
snapshot_download
(
'
wanderkid
/PDF-Extract-Kit'
)
model_dir
=
snapshot_download
(
'
opendatalab
/PDF-Extract-Kit'
)
print
(
f
"模型文件下载路径为:{model_dir}/models"
)
```
...
...
magic_pdf/para/para_split_v2.py
View file @
6f58eeab
...
...
@@ -100,7 +100,7 @@ def __detect_list_lines(lines, new_layout_bboxes, lang):
if
lang
!=
'en'
:
return
lines
,
None
else
:
total_lines
=
len
(
lines
)
line_fea_encode
=
[]
"""
...
...
@@ -114,6 +114,9 @@ def __detect_list_lines(lines, new_layout_bboxes, lang):
x_map_tag_dict
,
min_x_tag
=
cluster_line_x
(
lines
)
for
l
in
lines
:
span_text
=
__get_span_text
(
l
[
'spans'
][
0
])
if
not
span_text
:
line_fea_encode
.
append
(
0
)
continue
first_char
=
span_text
[
0
]
layout
=
__find_layout_bbox_by_line
(
l
[
'bbox'
],
new_layout_bboxes
)
if
not
layout
:
...
...
signatures/version1/cla.json
View file @
6f58eeab
...
...
@@ -31,6 +31,22 @@
"created_at"
:
"2024-08-13T12:23:16Z"
,
"repoId"
:
765083837
,
"pullRequestNo"
:
418
},
{
"name"
:
"Matthijz98"
,
"id"
:
17087153
,
"comment_id"
:
2298912989
,
"created_at"
:
"2024-08-20T13:49:50Z"
,
"repoId"
:
765083837
,
"pullRequestNo"
:
467
},
{
"name"
:
"strongerfly"
,
"id"
:
11643869
,
"comment_id"
:
2309481561
,
"created_at"
:
"2024-08-26T07:01:49Z"
,
"repoId"
:
765083837
,
"pullRequestNo"
:
487
}
]
}
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment