Unverified Commit 58bfcc9c authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub

fix(pdf-parse-union-core): #492 decrease span threshold for block filling (#500)

Reduce the span threshold used in fill_spans_in_blocks from 0.6 to 0.3 to
improve the accuracy of block filling based on layout analysis.
parent 2c3e35fe
...@@ -175,7 +175,7 @@ def parse_page_core(pdf_docs, magic_model, page_id, pdf_bytes_md5, imageWriter, ...@@ -175,7 +175,7 @@ def parse_page_core(pdf_docs, magic_model, page_id, pdf_bytes_md5, imageWriter,
sorted_blocks = sort_blocks_by_layout(all_bboxes, layout_bboxes) sorted_blocks = sort_blocks_by_layout(all_bboxes, layout_bboxes)
'''将span填入排好序的blocks中''' '''将span填入排好序的blocks中'''
block_with_spans, spans = fill_spans_in_blocks(sorted_blocks, spans, 0.6) block_with_spans, spans = fill_spans_in_blocks(sorted_blocks, spans, 0.3)
'''对block进行fix操作''' '''对block进行fix操作'''
fix_blocks = fix_block_spans(block_with_spans, img_blocks, table_blocks) fix_blocks = fix_block_spans(block_with_spans, img_blocks, table_blocks)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment