elstar IT

Fullstack | Java | Tech Speaker | Tech Coach | Frank van der Linden

  • About me
  • Blog license
  • My Open source projects

HR Assistant – Document Conversion but different

01-12-2017 2 responses flinden68 business community development hrassistant

Document conversion was one of the top items on our backlog which didn’t make it in the final contest version.

 

What is Watson Document Conversion

On the Document Conversion API documentation page is the following stated about this service

“The IBM Watson™ Document conversion service converts a single HTML, PDF, or Microsoft Word™ document into a normalized HTML, plain text, or a set of JSON-formatted Answer units that can be used with other Watson services. Carefully inspect output to make sure that it contains all elements and metadata required by your or your organization’s security standards.”

 

How we use it.

In HR Assistant we use the Watson Document Conversion Service in the Job Application part.

A job applicant upload a resume in PDF or docx format.The data is stored in the document on Cloudant.

When someone is analysing the data of the job application.

Step1: if there is an attachment, it will be send to the Watson Document Conversion to get the text representation in return.

flow Watson Document Conversion Service

Step 2: combine the text from Document with other text of the job application and send to the various Watson services we use in HR Assistant.

Change of plans

Watson Document Conversion is already deprecated. For stand alone conversion as we do in HR Assistant there is no migration to other Watson services. IBM recommands to use Apache Tika for this sort of operations.

So we have  to refactor the current Document Conversion service in HR Assistant.

 

StHello Document Conversion API

Due to security restrictions in Domino, I created a simple Spring Boot application, with 2 end points. I deployed it via the command line to Bluemix. And secured the endpoints via the build-in API Management.

Endpoints

Url: /api/convert-to-plain-text
Method: POST
Consumes: File
Produces: JSON
{
“convertedText”: “text”,
“message”: “information or error message”
}

File request

Url: /api/convert-data-to-plain-text
Method: POST
Consumes: JSON
{
“mediaType”: “text”,
“data”: “base64 encoded representation of text/file”
}
Produces: JSON
{
“convertedText”: “text”,
“message”: “information or error message”
}

Swagger

To make it easy to test the api I have added Swagger

swagger

Bonus

As a bonus I have added also Docker support, so it can run everywhere. You can find the Instruction in the ReadMe.

The Document Conversion API can be found in a public repository on Bitbucket.Feel free to use it. Better let me know if there are new feature request or add it your self and submit a Pull Request

Tags: development, hrassistant, opensource

2 thoughts on “HR Assistant – Document Conversion but different”

  1. Pingback: Document Conversion API opensourced - elstar IT
  2. Pingback: Document Conversion API now open source - elstar IT

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • « Quick tip: access the application log files on Bluemix
  • Document Conversion API now open source »

Contact me

My name is Frank van der Linden and I am an independent software developer based in the Netherlands. The last 2 years I was awarded as IBM Champion. Also I am on the board of OpenNTF. My specialisations are Java, Web development and Domino.


If you want to hire me, please fill in the Contact form


IBM Champion web badge
Apache Logo

All the code on this blog are under the Apache License 2.0. For more details, see Apache License 2.0

Most recent posts

  • Engage 2020: Hello are you listening, There is stream for everything
  • Spring Cloud Function on Azure run locally
  • Deploy Spring Cloud Function to IBM Cloud
  • Speaking (again) at Engage in a Zoo
  • Congratulations, you’re an IBM Champion (again)!

Latest reactions

  • Spring Cloud Function on Azure run locally - elstar IT on Deploy Spring Cloud Function to IBM Cloud
  • flinden68 on Quick XPages tip: add Fullcalendar plugin to your application
  • Rajesh samal on Quick tip: Swagger support for Spring Webflux
  • dsieyx on Quick XPages tip: add Fullcalendar plugin to your application
  • John on Named as IBM Champion 2019

Archive

  • March 2020
  • February 2020
  • January 2020
  • October 2019
  • September 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • January 2019
  • December 2018
  • October 2018
  • September 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • December 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014

Category

  • bluemix
  • business
  • cloudant
  • community
  • development
  • hrassistant
  • openntf
  • running
  • salesforce
  • Springboot
  • Tesla
  • trailrunning
  • Uncategorized
  • watson
  • OpenNTF
  • Collaboration Today
  • XSnippets
  • Stackoverflow
  • IBM Collaboration Solutions
  • Social Business Toolkit
  • About me
  • Dutch curriculum vitae
  • English curriculum vitae
  • Google+
  • LinkedIn profile
  • Twitter
  • Slideshare
  • Blog license
  • My open source projects