Skip to content

Commit

Permalink
fix: Change ocr_line <span> to include all ocr_word (#169)
Browse files Browse the repository at this point in the history
Fixes the xml for ocr_line. The span of ocr_line should enclose all spans of ocr_word
  • Loading branch information
ralscha committed Oct 3, 2023
1 parent 3c3f09d commit bc44dab
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 76 deletions.
Expand Up @@ -18,9 +18,10 @@
{% set paridx = loop.index0 -%}
<span class='ocr_par' id='par_{{ page_number }}_{{ bidx }}_{{ paridx }}' title='{{ paragraph.hocr_bounding_box -}}'>{% for line in paragraph.lines -%}
{% set lidx = loop.index0 -%}
<span class='ocr_line' id='line_{{ page_number }}_{{ bidx }}_{{ paridx }}_{{ lidx }}' title='{{ line.hocr_bounding_box }}'>{{ line.text }}</span>{% for token in line.tokens -%}
<span class='ocr_line' id='line_{{ page_number }}_{{ bidx }}_{{ paridx }}_{{ lidx }}' title='{{ line.hocr_bounding_box }}'>{{ line.text }}{% for token in line.tokens -%}
{% set tidx = loop.index0 -%}
<span class='ocrx_word' id='word_{{ page_number }}_{{ bidx }}_{{ paridx }}_{{ lidx }}_{{ tidx }}' title='{{ token.hocr_bounding_box }}'>{{ token.text }}</span>{% endfor -%}{% endfor -%}
<span class='ocrx_word' id='word_{{ page_number }}_{{ bidx }}_{{ paridx }}_{{ lidx }}_{{ tidx }}' title='{{ token.hocr_bounding_box }}'>{{ token.text }}</span>{% endfor -%}
</span>{% endfor -%}
</span>{% endfor -%}
</span>{% endfor -%}
</div>
Expand Down

0 comments on commit bc44dab

Please sign in to comment.