Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
dd787f46
Commit
dd787f46
authored
Jul 17, 2024
by
myhloli
Browse files
Options
Browse Files
Download
Plain Diff
Merge remote-tracking branch 'origin/master'
parents
1e3c1ef5
30f06136
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
58 additions
and
9 deletions
+58
-9
README.md
README.md
+2
-3
demo.py
demo/demo.py
+3
-0
how_to_download_models_en.md
docs/how_to_download_models_en.md
+28
-4
how_to_download_models_zh_cn.md
docs/how_to_download_models_zh_cn.md
+25
-2
No files found.
README.md
View file @
dd787f46
...
...
@@ -64,10 +64,9 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3

###
Submodule Repositorie
s
###
Dependency repository
s
-
[
PDF-Extract-Kit
](
https://github.com/opendatalab/PDF-Extract-Kit
)
-
A Comprehensive Toolkit for High-Quality PDF Content Extraction
-
[
PDF-Extract-Kit : A Comprehensive Toolkit for High-Quality PDF Content Extraction
](
https://github.com/opendatalab/PDF-Extract-Kit
)
🚀🚀🚀
## Getting Started
...
...
demo/demo.py
View file @
dd787f46
...
...
@@ -6,6 +6,9 @@ from loguru import logger
from
magic_pdf.pipe.UNIPipe
import
UNIPipe
from
magic_pdf.rw.DiskReaderWriter
import
DiskReaderWriter
import
magic_pdf.model
as
model_config
model_config
.
__use_inside_model__
=
True
try
:
current_script_dir
=
os
.
path
.
dirname
(
os
.
path
.
abspath
(
__file__
))
demo_name
=
"demo1"
...
...
docs/how_to_download_models_en.md
View file @
dd787f46
###
#
Install Git LFS
### Install Git LFS
Before you begin, make sure Git Large File Storage (Git LFS) is installed on your system. Install it using the following command:
```
bash
git lfs
install
```
###
#
Download the Model from Hugging Face
### Download the Model from Hugging Face
To download the
`PDF-Extract-Kit`
model from Hugging Face, use the following command:
```
bash
...
...
@@ -15,13 +15,37 @@ git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit
Ensure that Git LFS is enabled during the clone to properly download all large files.
Move the 'models' directory to a directory on a larger disk space, preferably an SSD.
### Download the Model from ModelScope
#### SDK Download
```
bash
# First, install the ModelScope library using pip:
pip
install
modelscope
```
```
python
# Use the following Python code to download the model using the ModelScope SDK:
from
modelscope
import
snapshot_download
model_dir
=
snapshot_download
(
'wanderkid/PDF-Extract-Kit'
)
```
#### Git Download
Alternatively, you can use Git to clone the model repository from ModelScope:
```
bash
git clone https://www.modelscope.cn/wanderkid/PDF-Extract-Kit.git
```
Put
[
model files
](
)
here:
```
./
├── Layout
│ ├── config.json
│ └──
model_final
.pth
│ └──
weights
.pth
├── MFD
│ └── weights.pt
├── MFR
...
...
docs/how_to_download_models_zh_cn.md
View file @
dd787f46
###
#
安装 Git LFS
### 安装 Git LFS
开始之前,请确保您的系统上已安装 Git 大文件存储 (Git LFS)。使用以下命令进行安装
```
bash
git lfs
install
```
###
#
从 Hugging Face 下载模型
### 从 Hugging Face 下载模型
请使用以下命令从 Hugging Face 下载 PDF-Extract-Kit 模型:
```
bash
...
...
@@ -15,6 +15,29 @@ git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit
确保在克隆过程中启用了 Git LFS,以便正确下载所有大文件。
### 从 ModelScope 下载模型
#### SDK下载
```
bash
# 首先安装modelscope
pip
install
modelscope
```
```
python
# 使用modelscope sdk下载模型
from
modelscope
import
snapshot_download
model_dir
=
snapshot_download
(
'wanderkid/PDF-Extract-Kit'
)
```
#### Git下载
也可以使用git clone从 ModelScope 下载模型:
```
bash
git clone https://www.modelscope.cn/wanderkid/PDF-Extract-Kit.git
```
将 'models' 目录移动到具有较大磁盘空间的目录中,最好是在固态硬盘(SSD)上。
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment