- 10 Sep, 2024 9 commits
-
-
drunkpig authored
* release: release 0.7.1 version (#526) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#493) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#508) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#511) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com> * Hotfix readme 0.7.1 (#528) * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md add HF、modelscope、colab url * Update README.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * Rename README.md to README_zh-CN.md * Create readme.md * Rename readme.md to README.md * Rename README.md to README_zh-CN.md * Update README_zh-CN.md * Create README.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * Create download_models_hf.py * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md * Update README.md * Update README_zh-CN.md * Update FAQ_zh_cn.md * Update FAQ_en_us.md * Update FAQ_zh_cn.md * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 (#573) * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * Update README_zh-CN.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * add rag data api * Update README_zh-CN.md update rag api image * Update README.md docs: remove RAG related release notes * Update README_zh-CN.md docs: remove RAG related release notes * Update README_zh-CN.md update 更新记录 --------- Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com>
-
linfeng authored
* feat: mineru_web * fix: mineru_web * fix: mineru_web
-
Xiaomeng Zhao authored
feat(gradio_app): add web app with PDF processing as a project
-
myhloli authored
-
Xiaomeng Zhao authored
refactor(pdf_extract_kit): update model config and weight paths for UniMERNet-0.2.0
-
Xiaomeng Zhao authored
feat(ocr): supports minority languages
-
myhloli authored
Update the paths to model weights and configuration files for the UniMERNet architecture in both the demo.yaml and model_configs.yaml files. Adjust the mfr_model_init function toreflect the new weight and configuration paths. The changes include specifying more detailed paths to the unimernet_base directory and changing the weight file extension to .pth.
-
myhloli authored
Update the paths to model weights and configuration files for the UniMERNet architecture in both the demo.yaml and model_configs.yaml files. Adjust the mfr_model_init function toreflect the new weight and configuration paths. The changes include specifying more detailed paths to the unimernet_base directory and changing the weight file extension to .pth.
-
myhloli authored
Update the paths to model weights and configuration files for the UniMERNet architecture in both the demo.yaml and model_configs.yaml files. Adjust the mfr_model_init function toreflect the new weight and configuration paths. The changes include specifying more detailed paths to the unimernet_base directory and changing the weight file extension to .pth.
-
- 09 Sep, 2024 7 commits
-
-
myhloli authored
Introduce a new parameter `lang` to the `UNIPipe` class's `parse_union_pdf` function to allow specifying the language for document analysis. This enhancement enables users to customizethe language setting for better processing of multilingual documents.
-
myhloli authored
-
Kaiwen Liu authored
* fix: resolve inaccuracy of drawing layout box caused by paragraphs combination * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md add HF、modelscope、colab url * Update README.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * Rename README.md to README_zh-CN.md * Create readme.md * Rename readme.md to README.md * Rename README.md to README_zh-CN.md * Update README_zh-CN.md * Create README.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md * Update README.md * Update README_zh-CN.md * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com>
-
myhloli authored
-
myhloli authored
Pass the `lang` parameter to `custom_model_init` in `doc_analyze` to support language-specific OCR configurations. This enhancement allows the use of language information to improve OCR accuracy when processing PDFs.
-
Xiaomeng Zhao authored
-
yanqiangmiffy authored
* features@add mineru gpu&web_api * features@update api
-
- 05 Sep, 2024 2 commits
-
-
linfeng authored
-
Xiaomeng Zhao authored
-
- 03 Sep, 2024 6 commits
-
-
Xiaomeng Zhao authored
Refactor the pdf_extract_kit module to utilize a singleton pattern when initializing atomic models. This change ensures that atomic models are instantiated at most once, optimizing memory usage and reducing redundant initialization steps. The AtomModelSingleton class now manages the instantiation and retrieval of atomic models, improving the overall structure and efficiency of the codebase.
-
icecraft authored
* feat: support figure footnote * feat: using the relative position to combine footnote, table, image * feat: add the readme of projects * fix: code spell in unittest --------- Co-authored-by:
icecraft <xurui1@pjlab.org.cn>
-
Xiaomeng Zhao authored
Removed the previously used gradio and gradio-pdf imports which were not leveraged in the code. Also, replaced the custom `show_pdf` function with direct use of the `PDF` component from gradio for a simpler and more integrated PDF upload and display solution, improving code maintainability and readability.
-
icecraft authored
Co-authored-by:
icecraft <xurui1@pjlab.org.cn>
-
Kaiwen Liu authored
* fix: resolve inaccuracy of drawing layout box caused by paragraphs combination * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384
-
Xiaomeng Zhao authored
-
- 02 Sep, 2024 7 commits
-
-
sfk authored
delete Known issue about table recognition
-
sfk authored
* release: release 0.7.1 version (#526) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#493) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#508) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#511) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com> * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md --------- Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com>
-
yyy authored
* feat<table model>: add tablemaster with paddleocr to detect and recognize table (#493) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#508) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#511) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> --------- Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com>
-
drunkpig authored
-
drunkpig authored
* fix replace \u0002, \u0003 in common text * fix(para): When an English line ends with a hyphen, do not add a space at the end.
-
Xiaomeng Zhao authored
-
drunkpig authored
* fix replace \u0002, \u0003 in common text * fix(para): When an English line ends with a hyphen, do not add a space at the end.
-
- 31 Aug, 2024 1 commit
-
-
Xiaomeng Zhao authored
-
- 30 Aug, 2024 3 commits
-
-
Xiaomeng Zhao authored
-
icecraft authored
* Create requirements-docker.txt * feat: update deps to support rag * feat: add support to rag, add rag_data_reader api for rag integration * feat: let user retrieve the filename of the processed file * feat: add projects demo for rag integrations --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
icecraft <xurui1@pjlab.org.cn>
-
Xiaomeng Zhao authored
* feat(cli&analyze&pipeline): add start_page and end_page args for paginationAdd start_page_id and end_page_id arguments to various components of the PDF parsing pipeline to support pagination functionality. This feature allows users to specify the range of pages to be processed, enhancing the efficiency and flexibility of the system. * feat(cli&analyze&pipeline): add start_page and end_page args for paginationAdd start_page_id and end_page_id arguments to various components of the PDF parsing pipeline to support pagination functionality. This feature allows users to specify the range of pages to be processed, enhancing the efficiency and flexibility of the system. * feat(cli&analyze&pipeline): add start_page and end_page args for paginationAdd start_page_id and end_page_id arguments to various components of the PDF parsing pipeline to support pagination functionality. This feature allows users to specify the range of pages to be processed, enhancing the efficiency and flexibility of the system.
-
- 28 Aug, 2024 5 commits
-
-
drunkpig authored
-
Xiaomeng Zhao authored
Previously, small blocks that overlapped with larger ones were merely removed. This fix changes the approach to merge smaller blocks into the larger block instead, ensuring that no information is lost and the larger block encompasses all the text content fully.
-
Xiaomeng Zhao authored
Reduce the span threshold used in fill_spans_in_blocks from 0.6 to 0.3 to improve the accuracy of block filling based on layout analysis.
-
wangbinDL authored
-
yyy authored
-