Document conversion was one of the top items on our backlog which didn’t make it in the final contest version.
What is Watson Document Conversion
On the Document Conversion API documentation page is the following stated about this service
“The IBM Watson™ Document conversion service converts a single HTML, PDF, or Microsoft Word™ document into a normalized HTML, plain text, or a set of JSON-formatted Answer units that can be used with other Watson services. Carefully inspect output to make sure that it contains all elements and metadata required by your or your organization’s security standards.”
How we use it.
In HR Assistant we use the Watson Document Conversion Service in the Job Application part.
A job applicant upload a resume in PDF or docx format.The data is stored in the document on Cloudant.
When someone is analysing the data of the job application.
Step1: if there is an attachment, it will be send to the Watson Document Conversion to get the text representation in return.
Step 2: combine the text from Document with other text of the job application and send to the various Watson services we use in HR Assistant.
Change of plans
Watson Document Conversion is already deprecated. For stand alone conversion as we do in HR Assistant there is no migration to other Watson services. IBM recommands to use Apache Tika for this sort of operations.
So we have to refactor the current Document Conversion service in HR Assistant.
StHello Document Conversion API
Due to security restrictions in Domino, I created a simple Spring Boot application, with 2 end points. I deployed it via the command line to Bluemix. And secured the endpoints via the build-in API Management.
Endpoints
Url: /api/convert-to-plain-text
Method: POST
Consumes: File
Produces: JSON
{
“convertedText”: “text”,
“message”: “information or error message”
}
Url: /api/convert-data-to-plain-text
Method: POST
Consumes: JSON
{
“mediaType”: “text”,
“data”: “base64 encoded representation of text/file”
}
Produces: JSON
{
“convertedText”: “text”,
“message”: “information or error message”
}
Swagger
To make it easy to test the api I have added Swagger
Bonus
As a bonus I have added also Docker support, so it can run everywhere. You can find the Instruction in the ReadMe.
The Document Conversion API can be found in a public repository on Bitbucket.Feel free to use it. Better let me know if there are new feature request or add it your self and submit a Pull Request
2 thoughts on “HR Assistant – Document Conversion but different”