Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
80d0db4d
Commit
80d0db4d
authored
Jul 12, 2024
by
myhloli
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
docs(readme): update installation and usage instructions
parent
956eac57
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
82 additions
and
4 deletions
+82
-4
README.md
README.md
+82
-4
No files found.
README.md
View file @
80d0db4d
...
@@ -84,18 +84,68 @@ Development is based on Python 3.10, should you encounter problems with other Py
...
@@ -84,18 +84,68 @@ Development is based on Python 3.10, should you encounter problems with other Py
#### 1. Install Magic-PDF
#### 1. Install Magic-PDF
```
bash
```
bash
# If you only need the basic features (without built-in model parsing functionality)
pip
install
magic-pdf
pip
install
magic-pdf
# or
# For complete parsing capabilities (including high-precision model parsing)
pip
install
magic-pdf[full-cpu]
# Additionally, you will need to install the dependency detectron2.
# For detectron2, compile it yourself as per https://github.com/facebookresearch/detectron2/issues/5114
# Or use our precompiled wheel
# windows
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-win_amd64.whl
# linux
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-linux_x86_64.whl
# macOS(Intel)
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-macosx_10_9_universal2.whl
# macOS(M1/M2/M3)
pip
install
https://github.com/opendatalab/MinerU/raw/master/assets/whl/detectron2-0.6-cp310-cp310-macosx_11_0_arm64.whl
```
```
#### 2. Usage via Command Line
###### simple
#### 2. Downloading model weights files
For detailed references, please see below
[
how_to_download_models
](
docs/how_to_download_models.md
)
After downloading the model weights, move the 'models' directory to a directory on a larger disk space, preferably an SSD.
#### 3. Copy the Configuration File and Make Configurations
```
bash
```
bash
# Copy the configuration file to the root directory
cp
magic-pdf.template.json ~/magic-pdf.json
cp
magic-pdf.template.json ~/magic-pdf.json
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
In magic-pdf.json, configure "models-dir" to point to the directory where the model weights files are located.
```
json
{
"models-dir"
:
"/tmp/models"
}
```
#### 4. Usage via Command Line
###### simple
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--inside_model
true
```
```
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
After the program has finished, you can find the generated markdown files under the directory "/tmp/magic-pdf".
You can find the corresponding xxx_model.json file in the markdown directory.
If you intend to do secondary development on the post-processing pipeline, you can use the command:
```
bash
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
In this way, you won't need to re-run the model data, making debugging more convenient.
###### more
###### more
...
@@ -103,7 +153,35 @@ After the program has finished, you can find the generated markdown files under
...
@@ -103,7 +153,35 @@ After the program has finished, you can find the generated markdown files under
magic-pdf
--help
magic-pdf
--help
```
```
#### 3. Usage via Api
#### 5. Acceleration Using CUDA or MPS
##### CUDA
You need to install the corresponding PyTorch version according to your CUDA version.
```
bash
# When using the GPU solution, you need to reinstall PyTorch for the corresponding CUDA version. This example installs the CUDA 11.8 version.
pip
install
--force-reinstall
torch
==
2.3.1
torchvision
==
0.18.1
--index-url
https://download.pytorch.org/whl/cu118
```
Also, you need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```
json
{
"device-mode"
:
"cuda"
}
```
##### MPS
For macOS users with M-series chip devices, you can use MPS for inference acceleration.
You also need to modify the value of "device-mode" in the configuration file magic-pdf.json.
```
json
{
"device-mode"
:
"mps"
}
```
#### 6. Usage via Api
###### Local
###### Local
```
python
```
python
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment