It would be great to preserve the OCRd text from the hOCR files, but does well-known text "arbitrary text"? Does geojson?