Merge pull request #701 from myhloli/dev

docs: update CUDA acceleration guides and README content

Merge pull request #701 from myhloli/dev
docs: update CUDA acceleration guides and README content
1030ebad · Xiaomeng Zhao · GitHub · 01306098 · a1c7b5a7 · 1030ebad
Unverified Commit 1030ebad authored Oct 08, 2024 by Xiaomeng Zhao Committed by GitHub Oct 08, 2024
6 changed files
--- a/README.md
+++ b/README.md
@@ -167,14 +167,13 @@ In non-mainline environments, due to the diversity of hardware and software conf
        <td rowspan="2">GPU Hardware Support List</td>
        <td colspan="2">Minimum Requirement 8G+ VRAM</td>
        <td colspan="2">3060ti/3070/3080/3080ti/4060/4070/4070ti<br>
-        8G VRAM only enables layout and formula recognition acceleration</td>
+        8G VRAM enables layout, formula recognition acceleration and OCR acceleration</td>
        <td rowspan="2">None</td>
    </tr>
    <tr>
        <td colspan="2">Recommended Configuration 16G+ VRAM</td>
        <td colspan="2">3090/3090ti/4070ti super/4080/4090<br>
-        16G or more can enable layout, formula recognition, and OCR acceleration simultaneously<br>
+        16G VRAM or more can enable layout, formula recognition, OCR acceleration and table recognition acceleration simultaneously
-        24G or more can enable layout, formula recognition, OCR acceleration and table recognition simultaneously
        </td>
    </tr>
 </table>
@@ -199,34 +198,20 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
 Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for detailed instructions.
-> ❗️After downloading the models, please make sure to verify the completeness of the model files.
+#### 3. Modify the Configuration File for Additional Configuration
->
-> Check if the model file sizes match the description on the webpage. If possible, use sha256 to verify the integrity of the files.
-#### 3. Copy and configure the template file
+After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
+You can find the `magic-pdf.json` file in your 【user directory】.
-You can find the `magic-pdf.template.json` template configuration file in the root directory of the repository.
+> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
-> ❗️Make sure to execute the following command to copy the configuration file to your **user directory**; otherwise, the program will not run.
+You can modify certain configurations in this file to enable or disable features, such as table recognition:
->
-> The user directory for Windows is `C:\Users\YourUsername`, for Linux it is `/home/YourUsername`, and for macOS it is `/Users/YourUsername`.
-```bash
+> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
-cp magic-pdf.template.json ~/magic-pdf.json
-```
-Find the `magic-pdf.json` file in your user directory and configure the "models-dir" path to point to the directory where the model weight files were downloaded in [Step 2](#2-download-model-weight-files).
-> ❗️Make sure to correctly configure the **absolute path** to the model weight files directory, otherwise the program will not run because it can't find the model files.
->
-> On Windows, this path should include the drive letter and all backslashes (`\`) in the path should be replaced with forward slashes (`/`) to avoid syntax errors in the JSON file due to escape sequences.
->
-> For example: If the models are stored in the "models" directory at the root of the D drive, the "model-dir" value should be `D:/models`.
 ```json
 {
  // other config
-  "models-dir": "D:/models",
  "table-config": {
        "model": "TableMaster", // Another option of this value is 'struct_eqtable'
        "is_table_recog_enable": false, // Table recognition is disabled by default, modify this value to enable it
@@ -307,7 +292,8 @@ The results will be saved in the `{some_output_dir}` directory. The output file
 ├── some_pdf_middle.json                 # MinerU intermediate processing result
 ├── some_pdf_model.json                  # model inference result
 ├── some_pdf_origin.pdf                  # original PDF file
-└── some_pdf_spans.pdf                   # smallest granularity bbox position information diagram
+├── some_pdf_spans.pdf                   # smallest granularity bbox position information diagram
+└── some_pdf_content_list.json           # Rich text JSON arranged in reading order
 ```
 For more information about the output files, please refer to the [Output File Description](docs/output_file_en_us.md).
@@ -353,7 +339,7 @@ TODO
 # TODO
- [ ] Semantic-based reading order
+- [x] Semantic-based reading order
 - [ ] List recognition within the text
 - [ ] Code block recognition within the text
 - [ ] Table of contents recognition

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -167,14 +167,13 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
        <td rowspan="2">GPU硬件支持列表</td>
        <td colspan="2">最低要求 8G+显存</td>
        <td colspan="2">3060ti/3070/3080/3080ti/4060/4070/4070ti<br>
-        8G显存仅可开启lavout和公式识别加速</td>
+        8G显存可开启layout、公式识别和ocr加速</td>
        <td rowspan="2">None</td>
    </tr>
    <tr>
        <td colspan="2">推荐配置 16G+显存</td>
        <td colspan="2">3090/3090ti/4070tisuper/4080/4090<br>
-        16G及以上可以同时开启layout，公式识别和ocr加速<br>
+        16G显存及以上可以同时开启layout、公式识别和ocr加速和表格识别加速<br>
-        24G及以上可以同时开启layout，公式识别，ocr加速和表格识别
        </td>
    </tr>
 </table>
@@ -201,35 +200,19 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 详细参考 [如何下载模型文件](docs/how_to_download_models_zh_cn.md)
-> ❗️模型下载后请务必检查模型文件是否下载完整
+#### 3. 修改配置文件以进行额外配置
->
-> 请检查目录下的模型文件大小与网页上描述是否一致，如果可以的话，最好通过sha256校验模型是否下载完整
-#### 3. 拷贝配置文件并进行配置
+完成[2. 下载模型权重文件](#2-下载模型权重文件)步骤后，脚本会自动生成用户目录下的magic-pdf.json文件，并自动配置默认模型路径。
+您可在【用户目录】下找到magic-pdf.json文件。
-在仓库根目录可以获得 [magic-pdf.template.json](magic-pdf.template.json) 配置模版文件
-> ❗️务必执行以下命令将配置文件拷贝到【用户目录】下，否则程序将无法运行
->
 > windows的用户目录为 "C:\\Users\\用户名", linux用户目录为 "/home/用户名", macOS用户目录为 "/Users/用户名"
-```bash
+您可修改该文件中的部分配置实现功能的开关，如表格识别功能：
-cp magic-pdf.template.json ~/magic-pdf.json
-```
-在用户目录中找到magic-pdf.json文件并配置"models-dir"为[2. 下载模型权重文件](#2-下载模型权重文件)中下载的模型权重文件所在目录
-> ❗️务必正确配置模型权重文件所在目录的【绝对路径】，否则会因为找不到模型文件而导致程序无法运行
->
-> windows系统中此路径应包含盘符，且需把路径中所有的`"\"`替换为`"/"`,否则会因为转义原因导致json文件语法错误。
-> 
-> 例如：模型放在D盘根目录的models目录，则model-dir的值应为"D:/models"
+>如json内没有如下项目，请手动添加需要的项目，并删除注释内容（标准json不支持注释）
 ```json
 {
  // other config
-  "models-dir": "D:/models",
  "table-config": {
        "model": "TableMaster", // 使用structEqTable请修改为'struct_eqtable'
        "is_table_recog_enable": false, // 表格识别功能默认是关闭的，如果需要修改此处的值
@@ -311,7 +294,8 @@ magic-pdf -p {some_pdf} -o {some_output_dir} -m auto
 ├── some_pdf_middle.json                 # minerU 中间处理结果
 ├── some_pdf_model.json                  # 模型推理结果
 ├── some_pdf_origin.pdf                  # 原 pdf 文件
-└── some_pdf_spans.pdf                   # 最小粒度的bbox位置信息绘图
+├── some_pdf_spans.pdf                   # 最小粒度的bbox位置信息绘图
+└── some_pdf_content_list.json           # 按阅读顺序排列的富文本json
 ```
 更多有关输出文件的信息，请参考[输出文件说明](docs/output_file_zh_cn.md)
@@ -357,7 +341,7 @@ TODO
 # TODO
- [ ] 基于语义的阅读顺序
+- [x] 基于语义的阅读顺序
 - [ ] 正文中列表识别
 - [ ] 正文中代码块识别
 - [ ] 目录识别

--- a/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
+++ b/docs/README_Ubuntu_CUDA_Acceleration_en_US.md
@@ -58,28 +58,12 @@
 ### 6. Download Models
   Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
-   After downloading, move the `models` directory to an SSD with more space.
-❗ After downloading the models, ensure they are complete:
+## 7. Understand the Location of the Configuration File
-   - Check that the file sizes match the description on the website.
-   - If possible, verify the integrity using SHA256.
-### 7. Configuration Before First Run
+After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
-   Obtain the configuration template file `magic-pdf.template.json` from the root directory of the repository.
+You can find the `magic-pdf.json` file in your user directory.
+> The user directory for Linux is "/home/username".
-❗ Execute the following command to copy the configuration file to your home directory, otherwise the program will not run:
-   ```sh
-   wget https://github.com/opendatalab/MinerU/raw/master/magic-pdf.template.json
-   cp magic-pdf.template.json ~/magic-pdf.json
-   ```
-   Find the `magic-pdf.json` file in your home directory and configure `"models-dir"` to be the directory where the model weights from Step 6 were downloaded.
-❗ Correctly specify the absolute path of the directory containing the model weights; otherwise, the program will fail due to missing model files.
-   ```json
-   {
-     "models-dir": "/tmp/models"
-   }
-   ```
 ### 8. First Run
   Download a sample file from the repository and test it.
@@ -90,7 +74,9 @@
 ### 9. Test CUDA Acceleration
-If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA acceleration:
+If your graphics card has at least **8GB** of VRAM, follow these steps to test CUDA acceleration:
+> ❗ Due to the extremely limited nature of 8GB VRAM for running this application, you need to close all other programs using VRAM to ensure that 8GB of VRAM is available when running this application.
 1. Modify the value of `"device-mode"` in the `magic-pdf.json` configuration file located in your home directory.
   ```json
@@ -105,8 +91,6 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA
 ### 10. Enable CUDA Acceleration for OCR
-❗ The following operations require a graphics card with at least 16GB of VRAM; otherwise, the program may crash or experience reduced performance.
 1. Download `paddlepaddle-gpu`. Installation will automatically enable OCR acceleration.
   ```sh
   python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

--- a/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
+++ b/docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
@@ -54,29 +54,11 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 ## 6. 下载模型
 详细参考 [如何下载模型文件](how_to_download_models_zh_cn.md)
-下载后请将models目录移动到空间较大的ssd磁盘目录  
-> ❗️模型下载后请务必检查模型文件是否下载完整
-> 
-> 请检查目录下的模型文件大小与网页上描述是否一致，如果可以的话，最好通过sha256校验模型是否下载完整
-> 
-## 7. 第一次运行前的配置
-在仓库根目录可以获得 [magic-pdf.template.json](../magic-pdf.template.json) 配置模版文件
-> ❗️务必执行以下命令将配置文件拷贝到【用户目录】下，否则程序将无法运行
->  
-> linux用户目录为 "/home/用户名"
-```bash
-wget https://gitee.com/myhloli/MinerU/raw/master/magic-pdf.template.json
-cp magic-pdf.template.json ~/magic-pdf.json
-```
-在用户目录中找到magic-pdf.json文件并配置"models-dir"为[6. 下载模型](#6-下载模型)中下载的模型权重文件所在目录
+## 7. 了解配置文件存放的位置
-> ❗️务必正确配置模型权重文件所在目录的【绝对路径】，否则会因为找不到模型文件而导致程序无法运行
+完成[6.下载模型](#6-下载模型)步骤后，脚本会自动生成用户目录下的magic-pdf.json文件，并自动配置默认模型路径。
-> 
+您可在【用户目录】下找到magic-pdf.json文件。 
-```json
+> linux用户目录为 "/home/用户名"
-{
-  "models-dir": "/tmp/models"
-}
-```
 ## 8. 第一次运行
 从仓库中下载样本文件，并测试
@@ -85,7 +67,8 @@ wget https://gitee.com/myhloli/MinerU/raw/master/demo/small_ocr.pdf
 magic-pdf -p small_ocr.pdf
 ```
 ## 9. 测试CUDA加速
-如果您的显卡显存大于等于8G，可以进行以下流程，测试CUDA解析加速效果
+如果您的显卡显存大于等于 **8GB** ，可以进行以下流程，测试CUDA解析加速效果
+>❗️因8GB显存运行本应用非常极限，需要关闭所有其他正在使用显存的程序以确保本应用运行时有足额8GB显存可用。
 **1.修改【用户目录】中配置文件magic-pdf.json中"device-mode"的值**
 ```json
@@ -100,7 +83,6 @@ magic-pdf -p small_ocr.pdf
 > 提示：CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`layout detection cost` 和 `mfr time` 应提速10倍以上。
 ## 10. 为ocr开启cuda加速
-> ❗️以下操作需显卡显存大于等于16G才可进行，否则会因为显存不足导致程序崩溃或运行速度下降
 **1.下载paddlepaddle-gpu, 安装完成后会自动开启ocr加速**
 ```bash

--- a/docs/README_Windows_CUDA_Acceleration_en_US.md
+++ b/docs/README_Windows_CUDA_Acceleration_en_US.md
@@ -29,47 +29,25 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
 ### 5. Download Models
   Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
-   After downloading, move the `models` directory to an SSD with more space.
-   >❗ After downloading the models, ensure they are complete:
+## 7. Understand the Location of the Configuration File
-   >- Check that the file sizes match the description on the website.
-   >- If possible, verify the integrity using SHA256.
-### 6. Configuration Before the First Run
+After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
-   Obtain the configuration template file `magic-pdf.template.json` from the repository root directory.
+You can find the `magic-pdf.json` file in your 【user directory】 .
+> The user directory for Windows is "C:/Users/username".
-   >❗️Execute the following command to copy the configuration file to your user directory, or the program will not run.
-   >   
-   > In Windows, user directory is "C:\Users\username"
-   ```powershell
-     (New-Object System.Net.WebClient).DownloadFile('https://github.com/opendatalab/MinerU/raw/master/magic-pdf.template.json', 'magic-pdf.template.json')
-     cp magic-pdf.template.json ~/magic-pdf.json
-   ```
-   Find the `magic-pdf.json` file in your user directory and configure `"models-dir"` to point to the directory where the model weights from step 5 were downloaded.
-   > ❗️Ensure the absolute path of the model weights directory is correctly configured, or the program will fail to run due to not finding the model files.
-   >    
-   > In Windows, this path should include the drive letter and replace all `"\"` to `"/"`.
-   >   
-   > Example: If the models are placed in the root directory of drive D, the value for `model-dir` should be `"D:/models"`.
-   ```json
-   {
-     "models-dir": "/tmp/models"
-   }
-   ```
 ### 7. First Run
   Download a sample file from the repository and test it.
   ```powershell
-     (New-Object System.Net.WebClient).DownloadFile('https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf', 'small_ocr.pdf')
+     wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf -O small_ocr.pdf
     magic-pdf -p small_ocr.pdf
   ```
 ### 8. Test CUDA Acceleration
   If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
+> ❗ Due to the extremely limited nature of 8GB VRAM for running this application, you need to close all other programs using VRAM to ensure that 8GB of VRAM is available when running this application.
   1. **Overwrite the installation of torch and torchvision** supporting CUDA.
      ```
      pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
@@ -93,12 +71,12 @@ Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86
      ```
 ### 9. Enable CUDA Acceleration for OCR
-   >❗️This operation requires at least 16GB of VRAM on your graphics card, otherwise it will cause the program to crash or slow down.
-   1. **Download paddlepaddle-gpu**, which will automatically enable OCR acceleration upon installation.
+1. **Download paddlepaddle-gpu**, which will automatically enable OCR acceleration upon installation.
      ```
      pip install paddlepaddle-gpu==2.6.1
      ```
-   2. **Run the following command to test OCR acceleration**:
+2. **Run the following command to test OCR acceleration**:
   ```
   magic-pdf -p small_ocr.pdf
   ```
--- a/docs/README_Windows_CUDA_Acceleration_zh_CN.md
+++ b/docs/README_Windows_CUDA_Acceleration_zh_CN.md
@@ -31,42 +31,22 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
 ## 5. 下载模型
 详细参考 [如何下载模型文件](how_to_download_models_zh_cn.md)
-下载后请将models目录移动到空间较大的ssd磁盘目录  
-> ❗️模型下载后请务必检查模型文件是否下载完整
-> 
-> 请检查目录下的模型文件大小与网页上描述是否一致，如果可以的话，最好通过sha256校验模型是否下载完整
-## 6. 第一次运行前的配置
-在仓库根目录可以获得 [magic-pdf.template.json](../magic-pdf.template.json) 配置模版文件
-> ❗️务必执行以下命令将配置文件拷贝到【用户目录】下，否则程序将无法运行
->  
-> windows用户目录为 "C:\Users\用户名"
-```powershell
-(New-Object System.Net.WebClient).DownloadFile('https://gitee.com/myhloli/MinerU/raw/master/magic-pdf.template.json', 'magic-pdf.template.json')
-cp magic-pdf.template.json ~/magic-pdf.json
-```
-在用户目录中找到magic-pdf.json文件并配置"models-dir"为[5. 下载模型](#5-下载模型)中下载的模型权重文件所在目录
+## 6. 了解配置文件存放的位置
-> ❗️务必正确配置模型权重文件所在目录的【绝对路径】，否则会因为找不到模型文件而导致程序无法运行
+完成[5.下载模型](#5-下载模型)步骤后，脚本会自动生成用户目录下的magic-pdf.json文件，并自动配置默认模型路径。
-> 
+您可在【用户目录】下找到magic-pdf.json文件。
-> windows系统中此路径应包含盘符，且需把路径中所有的`"\"`替换为`"/"`,否则会因为转义原因导致json文件语法错误。
+> windows用户目录为 "C:/Users/用户名"
-> 
-> 例如：模型放在D盘根目录的models目录，则model-dir的值应为"D:/models"
-```json
-{
-  "models-dir": "/tmp/models"
-}
-```
 ## 7. 第一次运行
 从仓库中下载样本文件，并测试
 ```powershell
-(New-Object System.Net.WebClient).DownloadFile('https://gitee.com/myhloli/MinerU/raw/master/demo/small_ocr.pdf', 'small_ocr.pdf')
+ wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf -O small_ocr.pdf
-magic-pdf -p small_ocr.pdf
+ magic-pdf -p small_ocr.pdf
 ```
 ## 8. 测试CUDA加速
-如果您的显卡显存大于等于8G，可以进行以下流程，测试CUDA解析加速效果
+如果您的显卡显存大于等于 **8GB** ，可以进行以下流程，测试CUDA解析加速效果
+>❗️因8GB显存运行本应用非常极限，需要关闭所有其他正在使用显存的程序以确保本应用运行时有足额8GB显存可用。
 **1.覆盖安装支持cuda的torch和torchvision**
 ```bash
@@ -88,10 +68,9 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
 ```bash
 magic-pdf -p small_ocr.pdf
 ```
-> 提示：CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`layout detection cost` 和 `mfr time` 应提速10倍以上。
+> 提示：CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断，通常情况下，`layout detection time` 和 `mfr time` 应提速10倍以上。
 ## 9. 为ocr开启cuda加速
-> ❗️以下操作需显卡显存大于等于16G才可进行，否则会因为显存不足导致程序崩溃或运行速度下降
 **1.下载paddlepaddle-gpu, 安装完成后会自动开启ocr加速**
 ```bash
@@ -101,5 +80,5 @@ pip install paddlepaddle-gpu==2.6.1
 ```bash
 magic-pdf -p small_ocr.pdf
 ```
-> 提示：CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`ocr cost`应提速10倍以上。
+> 提示：CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断，通常情况下，`ocr time`应提速10倍以上。