top of page

File Content Extractor

  1. Specify key words, phrases, or patterns of text to locate in files

  2. Set input settings to filter which files are searched

  3. Customise the output to appear on the screen, or output structured XML files for further analysis or ingestion into other systems via your own integration

  4. Let the software scan through your files to locate the information you’re looking for

Potential Use-Cases Across Many Industries

  • Searching for evidence in legal cases

  • Finding medical history in patient documentation

  • Verifying information for auditing

  • Digital forensic investigations in policing

  • Locating information for remediation programmes

  • Classifying files for business processes

  • Many more!

 

Our solution is highly configurable and not tethered to a particular industry or use-case.

Support for Lots of Common File Types

  • PDFs, both structured text and scanned photocopies via optical character recognition

  • Microsoft Office, including Word, Excel, and PowerPoint

  • Emails, including attachments within the emails

  • Common image files for scanned documents, such as BMP, JPG, PNG, TIF, and GIF

  • Text files, including code files, web pages, XML, JSON, and CSV

​

Faster Than Any Human

  • Why assign a dozen staff members to spend weeks searching through files when you can licence software to do it for you at a fraction of the cost?

  • QWERTY Software Solutions’ File Content Extractor multi-threads file searches to efficiently read documents, turning a single desktop computer into a dedicated team of virtual file reviewers

No Model Training Required

  • Unlike solutions advertising as “AI” (Artificial Intelligence), File Content Extractor does not require training, and it does not require thousands of files pre-reviewed by people to provide the expected output

  • This solution is designed to locate text based on “actuals”, not “assumptions”

  • This isn’t AI. It’s better.

Security of your Data Comes First

  • Unlike some providers where you must move your data to their processing centre to work with documents, you leave your documents in your business domain, and run File Content Extractor in your own network, eliminating the threat of data leakages and privacy threats

  • File Content Extractor makes no API calls, sends no data externally, and only saves information you configure it to save.

Pay for a Licence, Get Free Access to Expertise

  • After helping many business projects and operational teams utilise our unique file extraction method, QWERTY Software Solutions can help you navigate implementation to minimise risk and maximise results

  • You also get free support for using the software, including configuration and training

Flexible Licencing Options

  • Each license purchased is tied to a single computer and user account (no registration is required, and no Internet access is required to apply the license)

  • Licences are available in 1 month, 3 months, 6 months, and 12 months options

  • As a special promotion until the end of December 2025, the 1-, 3-, and 6-month licenses purchased will automatically have its length doubled for FREE (i.e. pay for a 6-month licence, get a 12-month licence)!

Free Demonstration Before Deciding to Purchase

  • We tailor our solution package for every customer, so you cannot directly purchase a licence and download the software from the website at this time.

  • Contact us for a free demonstration online

  • We can even demonstrate using sample files you provide to us in advance, so you can verify the expected results when run within your environment

Frequently Asked Questions

Can I extract text from a PDF?

Yes, PDF files are supported, and text is extracted based on simple search terms (e.g. “Tax Invoice”) or pattern matching (e.g. “Dear [Customer Name],”).

Does it work if documents are photocopied?

Yes, optical character recognition is included and can convert photocopied documents to text, with reliability scaled to the quality of the document.

Does any of our data leave our network or organisation?

No. Instead of moving large amounts of data outside the boundaries of your organisation, our solution caters to leaving it within the boundaries of your organisation and bringing the computing resources in instead. This way, instead of exporting potentially gigabytes of data, you only import a small application which fits on a standard CD or USB drive.

How do I know it will work? Can I try it for free?

Before purchasing, we offer free demonstrations online and can include any test files you’re willing to share prior to the demonstration.

Do we need to train it to work for our files?

No, our solution is not based on training AI models. If your file is supported by the software and readable (i.e. not encrypted or corrupt) then it will work out-of-the-box.

What is the process for purchasing?

  1. Contact us for a demonstration and to help us understand your use-case, so we can guide you if and how this technology can assist you

  2. If you’re happy with the demonstration, send us a list of computer usernames and computer names so we can generate licenses for you, and send you an invoice

  3. When the invoice is paid, we send the licenses to you and a copy of the application for installation on your computers.

We don’t want a desktop application. Do you have a web service instead?

No, we don’t support this for 2 reasons. One is that it’s more efficient when wanting to process potentially thousands of files or many gigabytes of data where it is hosted, as opposed to having to transfer it to an external service. The other reason is security of your data. By giving you the application, the data is as safe as it is in your environment, making it easier to use from a risk, privacy, legal, and operational point of view.

How long does it take to process files?

This is impossible to answer because the time taken depends on how many files there are, how many pages within each file, the file content itself (images versus text), the speed of your network (where applicable), the power of your computer, the number of computers you run it on. We have performance profiled and stress-tested the application and were able to process 200k (two hundred thousand) files (Excel workbooks with one worksheet per file) in under 4 minutes, on one computer, with the files read from a local hard drive, while searching for 2 pieces of information (one simple search and one pattern search).

How do we define the search criteria? What governance is recommended over the output?

Customers get free support to help write and test their search criteria. We can help you design governance and workflow based on our experience with previous use-cases and your specific needs. We generally advise running 3 partial runs as tests to fine-tune the search criteria before you run your content extractions and searches in bulk.

Questions?

Contact us any time to discuss your project needs and how we can assist.

bottom of page