• myhloli's avatar
    refactor(para): improve paragraph splitting algorithm · 8cc76c49
    myhloli authored
    - Adjust the threshold for identifying index blocks from 3 lines to 2 lines
    - Add a new function __is_list_group to detect if a group of blocks is a list
    - Modify the paragraph merging logic to handle list groups differently
    8cc76c49
Name
Last commit
Last update
..
dict2md Loading commit data...
filter Loading commit data...
integrations Loading commit data...
layout Loading commit data...
libs Loading commit data...
model Loading commit data...
para Loading commit data...
pipe Loading commit data...
post_proc Loading commit data...
pre_proc Loading commit data...
resources Loading commit data...
rw Loading commit data...
spark Loading commit data...
tools Loading commit data...
__init__.py Loading commit data...
pdf_parse_by_ocr.py Loading commit data...
pdf_parse_by_txt.py Loading commit data...
pdf_parse_union_core.py Loading commit data...
pdf_parse_union_core_v2.py Loading commit data...
user_api.py Loading commit data...