Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
21d7a693
Commit
21d7a693
authored
Jul 12, 2024
by
myhloli
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
docs(readme): update instructions for model download and environment setup
parent
61fab96e
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
84 additions
and
6 deletions
+84
-6
README.md
README.md
+4
-0
README_zh-CN.md
README_zh-CN.md
+79
-4
how_to_download_models.md
docs/how_to_download_models.md
+1
-2
No files found.
README.md
View file @
21d7a693
...
...
@@ -75,6 +75,10 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3
-
Python >= 3.9
It is recommended to use a virtual environment, either with venv or conda.
Development is based on Python 3.10, should you encounter problems with other Python versions, please switch to Python 3.10.
### Usage Instructions
#### 1. Install Magic-PDF
...
...
README_zh-CN.md
View file @
21d7a693
...
...
@@ -70,23 +70,69 @@ https://github.com/opendatalab/MinerU/assets/11393164/618937cb-dc6a-4646-b433-e3
python >= 3.9
推荐使用虚拟环境,venv和conda皆可。
开发基于python 3.10,如果在其他版本python出现问题请切换至3.10。
### 使用说明
#### 1. 安装Magic-PDF
```
bash
# 如果只需要基础功能(不含内置模型解析功能)
pip
install
magic-pdf
# or
# 完整解析功能(含内置高精度模型解析功能)
pip
install
magic-pdf[full-cpu]
# 另外需要安装依赖 detectron2
# detectron2需要编译安装,自行编译安装可以参考https://github.com/facebookresearch/detectron2/issues/5114
# 或直接使用我们编译好的的whl包,不同系统请自行选择适配包安装
# windows
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-win_amd64.whl
# linux
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-linux_x86_64.whl
# macOS(Intel)
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-macosx_10_9_universal2.whl
# macOS(M1/M2/M3)
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-macosx_11_0_arm64.whl
```
#### 2.
通过命令行使用
#### 2.
下载模型权重文件
###### 直接使用
详细参考
[
如何下载模型文件
](
docs/how_to_download_models.md
)
下载后请将models目录拷贝到空间较大的ssd磁盘目录
#### 3. 拷贝配置文件并进行配置
```
bash
# 拷贝配置文件到根目录
cp
magic-pdf.template.json ~/magic-pdf.json
```
在magic-pdf.json中配置"models-dir"为模型权重文件所在目录
```
json
{
"models-dir"
:
"/tmp/models"
}
```
#### 4. 通过命令行使用
###### 直接使用
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--inside_model
true
```
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件,markdown目录中可以找到对应的xxx_model.json文件
如果您有意对后处理pipeline进行二次开发,可以使用命令
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
程序运行完成后,你可以在"/tmp/magic-pdf"目录下看到生成的markdown文件
这样就不需要重跑模型数据,调试起来更方便
###### 更多用法
...
...
@@ -94,7 +140,36 @@ magic-pdf pdf-command --pdf "pdf_path" --model "model_json_path"
magic-pdf
--help
```
#### 3. 通过接口调用
#### 5. 使用CUDA或MPS进行加速
###### CUDA
需要根据自己的CUDA版本安装对应的pytorch版本
```
bash
# 使用gpu方案时,需要重新安装对应cuda版本的pytorch,例子是安装CUDA 11.8版本的
pip
install
--force-reinstall
torch
==
2.3.1
torchvision
==
0.18.1
--index-url
https://download.pytorch.org/whl/cu118
```
同时需要修改配置文件magic-pdf.json中"device-mode"的值
```
json
{
"device-mode"
:
"cuda"
}
```
###### MPS
使用macOS(M系列芯片设备)可以使用MPS进行推理加速
需要修改配置文件magic-pdf.json中"device-mode"的值
```
json
{
"device-mode"
:
"mps"
}
```
#### 6. 通过接口调用
###### 本地使用
```
python
...
...
docs/how_to_download_models.md
View file @
21d7a693
...
...
@@ -15,8 +15,7 @@ git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit
Ensure that Git LFS is enabled during the clone to properly download all large files.
Put
[
model files
](
)
here:
Move the 'models' directory to a directory on a larger disk space, preferably an SSD.
```
./
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment