- 2024/11/06 0.9.1 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
- 2024/11/06 0.9.2 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
- 2024/10/31 0.9.0 released. This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:
- 2024/10/31 0.9.0 released. This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:
- Refactored the sorting module code to use [layoutreader](https://github.com/ppaanngggg/layoutreader) for reading order sorting, ensuring high accuracy in various layouts.
- Refactored the sorting module code to use [layoutreader](https://github.com/ppaanngggg/layoutreader) for reading order sorting, ensuring high accuracy in various layouts.
- Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.
- Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.
...
@@ -138,13 +138,14 @@ There are three different ways to experience MinerU:
...
@@ -138,13 +138,14 @@ There are three different ways to experience MinerU:
-[Quick CPU Demo (Windows, Linux, Mac)](#quick-cpu-demo)
-[Quick CPU Demo (Windows, Linux, Mac)](#quick-cpu-demo)
-[Linux/Windows + CUDA](#Using-GPU)
-[Linux/Windows + CUDA](#Using-GPU)
**⚠️ Pre-installation Notice—Hardware and Software Environment Support**
> [!WARNING]
> **Pre-installation Notice—Hardware and Software Environment Support**
To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
>
> To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.
>
> By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.
In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
>
> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
<table>
<table>
<tr>
<tr>
...
@@ -224,11 +225,13 @@ Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for de
...
@@ -224,11 +225,13 @@ Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for de
After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】.
You can find the `magic-pdf.json` file in your 【user directory】.
> [!TIP]
> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
You can modify certain configurations in this file to enable or disable features, such as table recognition:
You can modify certain configurations in this file to enable or disable features, such as table recognition:
> [!NOTE]
> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
```json
```json
...
@@ -257,13 +260,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
...
@@ -257,13 +260,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
❗ After installation, make sure to check the version of `magic-pdf` using the following command:
> After installation, make sure to check the version of `magic-pdf` using the following command:
>
```sh
> ```sh
magic-pdf --version
> magic-pdf --version
```
> ```
>
If the version number is less than 0.7.0, please report the issue.
> If the version number is less than 0.7.0, please report the issue.
### 6. Download Models
### 6. Download Models
...
@@ -84,6 +85,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
...
@@ -84,6 +85,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your user directory.
You can find the `magic-pdf.json` file in your user directory.
> [!TIP]
> The user directory for Linux is "/home/username".
> The user directory for Linux is "/home/username".
> ❗️After installation, verify the version of `magic-pdf`:
> [!IMPORTANT]
> After installation, verify the version of `magic-pdf`:
>
>
> ```bash
> ```bash
> magic-pdf --version
> magic-pdf --version
...
@@ -45,6 +46,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
...
@@ -45,6 +46,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】 .
You can find the `magic-pdf.json` file in your 【user directory】 .
> [!TIP]
> The user directory for Windows is "C:/Users/username".
> The user directory for Windows is "C:/Users/username".
### 7. First Run
### 7. First Run
...
@@ -65,8 +67,8 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
...
@@ -65,8 +67,8 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
@@ -20,12 +20,13 @@ The configuration file can be found in the user directory, with the filename `ma
...
@@ -20,12 +20,13 @@ The configuration file can be found in the user directory, with the filename `ma
## 1. Models downloaded via Git LFS
## 1. Models downloaded via Git LFS
> [!IMPORTANT]
> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
>
> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.
When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.
> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
## 2. Models downloaded via Hugging Face or Model Scope
## 2. Models downloaded via Hugging Face or Model Scope
If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.
If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.