I came across this library textract for extracting text from various formats. I was interested to use this for extracting text from html files. Here is what I did to get it to install on my machine.
The installation outlines some steps that you need to perform. These steps can be found at the following url:
I wanted to install it for python 3.4 on ubuntu 14.04, but it seemed to only support python 2.x. Here is what I did to get it to install for python 3.4
Get your virtualenv setup first
virtualenv -p /usr/bin/python3.4 /usr/local
1. install required libraries for linux as outlined in the installation page.
2. download the source file for textract from
3. untar the downloaded file
4. cd into the directory and look for cases of :
except ShellError, e:
and change it to
except ShellError as e:
5. edit the requirements/python file comment out
6. install the python 3 equivalent
pip install pdfminer3k
7. finally run
python3.4 setup.py install
Everything should install at this point.
My name is Tyson Maly. I am a computer engineer working in the financial services industry for the past 9 years. I have also moonlighted as a consultant building web applications for businesses. This site has been around since 2003. I have been programming perl since 1996, and I have been developing websites since 1995. I have a wide variety of skills and can program the full stack from the devops side to the user interface.
I recently had to upgrade a task management system that has been running since 2006. It was based off the dotproject project management system written in PHP. Due to a move to a newer version of PHP, the system had some issues. Upon patching the system, I realized that the latest version of PHP had quite a handful of new features.
I have worked with PHP quite a bit over the years. I wrote my own CMS system that supported both MySql and Postgresql back in 2004. This system supported clients I was consulting for at the time.
WordPress has grown quite a bit since I began using it when it was just a simple blog system. I am working on a few plugins for the system. If you have a business that is in need of a feature for your wordpress site, feel free to contact me. I can discuss what is possible given your requirements.