Commit 928d123c authored by myhloli's avatar myhloli

docs: enhance documentation with important notices and tips

- Add important notice about git lfs download issues in model download docs
- Include warning about 0.9.x version changes in model update section
- Add tip for finding user directory in config file location
- Improve readability of TODO list in README files
- Standardize important notices and tips across multiple language versions
parent e20a62fd
......@@ -138,7 +138,7 @@ There are three different ways to experience MinerU:
- [Linux/Windows + CUDA](#Using-GPU)
> [!IMPORTANT]
> **⚠️ Pre-installation Notice—Hardware and Software Environment Support**
> **Pre-installation Notice—Hardware and Software Environment Support**
>
> To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
>
......@@ -258,14 +258,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
- [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md)
- [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
- Quick Deployment with Docker
> [!IMPORTANT]
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
>
> Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
> [!IMPORTANT]
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
>
> Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
```bash
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
docker build -t mineru:latest .
......@@ -379,12 +379,12 @@ TODO
# TODO
- 🗹 Reading order based on the model
- 🗹 Recognition of `index` and `list` in the main text
- 🗹 Table recognition
- Code block recognition in the main text
- [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
- Geometric shape recognition
- [x] Reading order based on the model
- [x] Recognition of `index` and `list` in the main text
- [x] Table recognition
- [ ] Code block recognition in the main text
- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
- [ ] Geometric shape recognition
# Known Issues
......
......@@ -264,14 +264,14 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
- [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md)
- [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
- 使用Docker快速部署
> [!IMPORTANT]
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
>
> 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
> [!IMPORTANT]
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
>
> 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
```bash
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
docker build -t mineru:latest .
......@@ -387,12 +387,12 @@ TODO
# TODO
- 🗹 基于模型的阅读顺序
- 🗹 正文中目录、列表识别
- 🗹 表格识别
- 正文中代码块识别
- [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
- 几何图形识别
- [x] 基于模型的阅读顺序
- [x] 正文中目录、列表识别
- [x] 表格识别
- [ ] 正文中代码块识别
- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
- [ ] 几何图形识别
# Known Issues
......
......@@ -8,7 +8,8 @@ nvidia-smi
If you see information similar to the following, it means that the NVIDIA drivers are already installed, and you can skip Step 2.
Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
> [!NOTE]
> Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
```plaintext
+---------------------------------------------------------------------------------------+
......@@ -64,14 +65,14 @@ conda activate MinerU
```sh
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
```
After installation, make sure to check the version of `magic-pdf` using the following command:
```sh
magic-pdf --version
```
If the version number is less than 0.7.0, please report the issue.
> [!IMPORTANT]
> After installation, make sure to check the version of `magic-pdf` using the following command:
>
> ```sh
> magic-pdf --version
> ```
>
> If the version number is less than 0.7.0, please report the issue.
### 6. Download Models
......@@ -82,6 +83,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your user directory.
> [!TIP]
> The user directory for Linux is "/home/username".
### 8. First Run
......
......@@ -8,7 +8,8 @@ nvidia-smi
如果看到类似如下的信息,说明已经安装了nvidia驱动,可以跳过步骤2
注意:`CUDA Version` 显示的版本号应 >= 12.1,如显示的版本号小于12.1,请升级驱动
> [!NOTE]
> `CUDA Version` 显示的版本号应 >= 12.1,如显示的版本号小于12.1,请升级驱动
```plaintext
+---------------------------------------------------------------------------------------+
......@@ -65,7 +66,8 @@ conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
```
> ❗️下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
> [!IMPORTANT]
> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
>
> ```bash
> magic-pdf --version
......@@ -82,6 +84,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
完成[6.下载模型](#6-下载模型)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
您可在【用户目录】下找到magic-pdf.json文件。
> [!TIP]
> linux用户目录为 "/home/用户名"
## 8. 第一次运行
......@@ -110,8 +113,8 @@ magic-pdf -p small_ocr.pdf -o ./output
```bash
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`layout detection cost` 和 `mfr time` 应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`layout detection cost` 和 `mfr time` 应提速10倍以上。
## 10. 为ocr开启cuda加速
......@@ -126,5 +129,5 @@ python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.
```bash
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr cost`应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr cost`应提速10倍以上。
......@@ -28,7 +28,8 @@ conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
```
> ❗️After installation, verify the version of `magic-pdf`:
> [!IMPORTANT]
> After installation, verify the version of `magic-pdf`:
>
> ```bash
> magic-pdf --version
......@@ -45,6 +46,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】 .
> [!TIP]
> The user directory for Windows is "C:/Users/username".
### 7. First Run
......@@ -65,8 +67,8 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
```
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
```
> ❗️Ensure the following versions are specified in the command:
> [!IMPORTANT]
> Ensure the following versions are specified in the command:
>
> ```
> torch==2.3.1 torchvision==0.18.1
......
......@@ -29,7 +29,8 @@ conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
```
> ❗️下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
> [!IMPORTANT]
> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
>
> ```bash
> magic-pdf --version
......@@ -46,6 +47,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
完成[5.下载模型](#5-下载模型)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
您可在【用户目录】下找到magic-pdf.json文件。
> [!TIP]
> windows用户目录为 "C:/Users/用户名"
## 7. 第一次运行
......@@ -67,7 +69,8 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
```
> ❗️务必在命令中指定以下版本
> [!IMPORTANT]
> 务必在命令中指定以下版本
>
> ```bash
> torch==2.3.1 torchvision==0.18.1
......@@ -89,7 +92,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断,通常情况下,`layout detection time` 和 `mfr time` 应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断,通常情况下,`layout detection time` 和 `mfr time` 应提速10倍以上。
## 9. 为ocr开启cuda加速
......@@ -104,5 +108,5 @@ pip install paddlepaddle-gpu==2.6.1
```bash
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr time`应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr time`应提速10倍以上。
......@@ -20,10 +20,12 @@ The configuration file can be found in the user directory, with the filename `ma
## 1. Models downloaded via Git LFS
> [!IMPORTANT]
> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.
> [!WARNING]
> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
## 2. Models downloaded via Hugging Face or Model Scope
......
......@@ -26,16 +26,19 @@ python脚本会自动下载模型文件并配置好配置文件中的模型目
配置文件可以在用户目录中找到,文件名为`magic-pdf.json`
> [!TIP]
> windows的用户目录为 "C:\\Users\\用户名", linux用户目录为 "/home/用户名", macOS用户目录为 "/Users/用户名"
# 此前下载过模型,如何更新
## 1. 通过git lfs下载过模型
> [!IMPORTANT]
> 由于部分用户反馈通过git lfs下载模型文件遇到下载不全和模型文件损坏情况,现已不推荐使用该方式下载。
当magic-pdf <= 0.8.1时,如此前通过 git lfs 下载过模型文件,可以进入到之前的下载目录中,通过`git pull`命令更新模型。
> [!WARNING]
> 0.9.x及以后版本由于PDF-Extract-Kit 1.0更换仓库和新增layout排序模型,不能通过`git pull`命令更新,需要使用python脚本一键更新。
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment