-
Notifications
You must be signed in to change notification settings - Fork 425
OAK-12046 - Update default Tika config #2671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
Marking draft as this seems to break full text indexing of PDFs. |
|
There is a class loading issue betweek tika-core and tika-parsers; up til now we did not explicitly configure any class from tika-parsers |
Adjust the class loader used for loading Tika configurations to allow configuring the PDFParser. By default Tika does not use the context class loader so we plug it in the existing abstraction. This effectively substitutes the tika-core classloader with the oak-lucene classloader, given that the FulltextBinaryTextExtractor ends up being embedded in oak-lucene.
thomasmueller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me. I'm afraid I'm not an export on Tika / class loading. But I don't see any obvious error.
|
Still some classloading issues to clarify |
No description provided.