Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for detailed instructions.
Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for detailed instructions.
> ❗️After downloading the models, please make sure to verify the completeness of the model files.
#### 3. Modify the Configuration File for Additional Configuration
>
> Check if the model file sizes match the description on the webpage. If possible, use sha256 to verify the integrity of the files.
#### 3. Copy and configure the template file
After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】.
You can find the `magic-pdf.template.json` template configuration file in the root directory of the repository.
> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
> ❗️Make sure to execute the following command to copy the configuration file to your **user directory**; otherwise, the program will not run.
You can modify certain configurations in this file to enable or disable features, such as table recognition:
>
> The user directory for Windows is `C:\Users\YourUsername`, for Linux it is `/home/YourUsername`, and for macOS it is `/Users/YourUsername`.
```bash
> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
cp magic-pdf.template.json ~/magic-pdf.json
```
Find the `magic-pdf.json` file in your user directory and configure the "models-dir" path to point to the directory where the model weight files were downloaded in [Step 2](#2-download-model-weight-files).
> ❗️Make sure to correctly configure the **absolute path** to the model weight files directory, otherwise the program will not run because it can't find the model files.
>
> On Windows, this path should include the drive letter and all backslashes (`\`) in the path should be replaced with forward slashes (`/`) to avoid syntax errors in the JSON file due to escape sequences.
>
> For example: If the models are stored in the "models" directory at the root of the D drive, the "model-dir" value should be `D:/models`.
Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
After downloading, move the `models` directory to an SSD with more space.
❗ After downloading the models, ensure they are complete:
## 7. Understand the Location of the Configuration File
- Check that the file sizes match the description on the website.
- If possible, verify the integrity using SHA256.
### 7. Configuration Before First Run
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
Obtain the configuration template file `magic-pdf.template.json` from the root directory of the repository.
You can find the `magic-pdf.json` file in your user directory.
> The user directory for Linux is "/home/username".
❗ Execute the following command to copy the configuration file to your home directory, otherwise the program will not run:
Find the `magic-pdf.json` file in your home directory and configure `"models-dir"` to be the directory where the model weights from Step 6 were downloaded.
❗ Correctly specify the absolute path of the directory containing the model weights; otherwise, the program will fail due to missing model files.
```json
{
"models-dir": "/tmp/models"
}
```
### 8. First Run
### 8. First Run
Download a sample file from the repository and test it.
Download a sample file from the repository and test it.
...
@@ -90,7 +74,9 @@
...
@@ -90,7 +74,9 @@
### 9. Test CUDA Acceleration
### 9. Test CUDA Acceleration
If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA acceleration:
If your graphics card has at least **8GB** of VRAM, follow these steps to test CUDA acceleration:
> ❗ Due to the extremely limited nature of 8GB VRAM for running this application, you need to close all other programs using VRAM to ensure that 8GB of VRAM is available when running this application.
1. Modify the value of `"device-mode"` in the `magic-pdf.json` configuration file located in your home directory.
1. Modify the value of `"device-mode"` in the `magic-pdf.json` configuration file located in your home directory.
```json
```json
...
@@ -105,8 +91,6 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA
...
@@ -105,8 +91,6 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA
### 10. Enable CUDA Acceleration for OCR
### 10. Enable CUDA Acceleration for OCR
❗ The following operations require a graphics card with at least 16GB of VRAM; otherwise, the program may crash or experience reduced performance.
1. Download `paddlepaddle-gpu`. Installation will automatically enable OCR acceleration.
1. Download `paddlepaddle-gpu`. Installation will automatically enable OCR acceleration.
Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
Refer to detailed instructions on [how to download model files](how_to_download_models_en.md).
After downloading, move the `models` directory to an SSD with more space.
>❗ After downloading the models, ensure they are complete:
## 7. Understand the Location of the Configuration File
>- Check that the file sizes match the description on the website.
>- If possible, verify the integrity using SHA256.
### 6. Configuration Before the First Run
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
Obtain the configuration template file `magic-pdf.template.json` from the repository root directory.
You can find the `magic-pdf.json` file in your 【user directory】 .
> The user directory for Windows is "C:/Users/username".
>❗️Execute the following command to copy the configuration file to your user directory, or the program will not run.
>
> In Windows, user directory is "C:\Users\username"
Find the `magic-pdf.json` file in your user directory and configure `"models-dir"` to point to the directory where the model weights from step 5 were downloaded.
> ❗️Ensure the absolute path of the model weights directory is correctly configured, or the program will fail to run due to not finding the model files.
>
> In Windows, this path should include the drive letter and replace all `"\"` to `"/"`.
>
> Example: If the models are placed in the root directory of drive D, the value for `model-dir` should be `"D:/models"`.
```json
{
"models-dir": "/tmp/models"
}
```
### 7. First Run
### 7. First Run
Download a sample file from the repository and test it.
Download a sample file from the repository and test it.
If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
> ❗ Due to the extremely limited nature of 8GB VRAM for running this application, you need to close all other programs using VRAM to ensure that 8GB of VRAM is available when running this application.
1.**Overwrite the installation of torch and torchvision** supporting CUDA.
1.**Overwrite the installation of torch and torchvision** supporting CUDA.