Optical Character Recognition (OCR) has revolutionized the way that businesses automate document processing. However, the quality and accuracy of the technology doesn’t cut it for every application. The more complex the document being processed, the less accurate it becomes. This is especially true for engineering drawings. Although out of the box OCR technologies may not be suited for this task, there are other ways to achieve your document processing goals with OCR. In what follows, I’ll explore several viable solutions to give you a general idea without going into too much technical detail.
Challenges of Engineering Drawing Recognition
When it comes to technical drawings, OCR struggles to understand the meaning of individual text elements. The technology can read the text, but it doesn’t understand its meaning. There are a number of opportunities for engineers and manufacturers to consider if the automatic recognition of the technical document is configured correctly. See the most significant of them below.
To achieve complex technical documentation analysis, engineers need to train AI models. Just like humans, AI models need experience and training to understand these drawings.
One challenge of blueprints and engineering drawings recognition is that the software must understand how to separate the different views of the drawing. These are different parts of the drawing that give a basic idea of its layout. By separating the views and understanding how they relate to one another, the software can calculate the bounding box.
This process may include several challenges:
- Views might overlap
- Views might be damaged
- Labels might be equidistant to two views
- Views might be nested
The relationship between views is another possible issue. You must consider whether the view is a flat part of the diagram, a turned part, a block, or something else. Additionally, there may be other problems like chained measures, missing annotations, implicitly defined heights through reference to a standard, or other problems.
Importantly, generic OCR cannot reliably understand text in drawings that is surrounded by graphical elements like lines, symbols, and annotations. Because of this fact, we need to dive deeper into OCR with machine learning which will be more helpful for this application.
Pre-Trained and Custom OCR Models
There’s no shortage of OCR software on the market, but not all of this software can be trained or modified by the user. As we’ve learned, training may be a necessity for analyzing your engineering drawings. However, OCR tools for these kinds of drawings do exist.
Pre-Trained OCR Tools
Here are some common options for OCR recognition of engineering drawings:
- ABBYY FineReader: this versatile blueprint-interpretation software offers OCR technology with recognition capabilities for text. It supports various image formats, layout retention, data export, and integrations.
- Adobe Acrobat Pro: in addition to providing PDF editing, viewing, and management, Acrobat allows you to scan OCR documents and blueprints, extract text, and perform searches. It supports various languages and allows users to configure options.
- Bluebeam Revu: another popular PDF application, Bluebeam Revu offers OCR technologies for engineering drawing text extraction.
- AutoCAD: standing for Computer Aided Design, AutoCAD supports OCR plugins for interpreting blueprints and converting them into editable CAD elements.
- PlanGrid: this software includes blueprint OCR interpretation out of the box. With this feature, you can upload blueprint images and then extract, organize, index, and search the text.
- Textract: this cloud-based AWS feature enables OCR analysis of documents and can extract elements like tables from documents. It can also recognize elements from blueprints and provides APIs for integration with other applications.
- Butler OCR: providing developers with document extraction APIs, Butler OCR combines machine learning with human review to enhance the accuracy of document recognition.
Custom OCR Solutions
If you’re looking for custom OCR solutions that can be trained to achieve better automatic data extraction from engineering drawings and adopt it to your specific data format, here are a few popular options:
- Tesseract: this flexible, open-source OCR engine maintained by Google can be trained on custom data to recognize blueprint-specific characters and symbols.
- OpenCV: Open-Source Computer Vision Library can be combined with OCR tools like Tesseract to build custom interpretative solutions. Its image processing and analysis functions can enhance the accuracy of OCR on engineering drawings when properly utilized.
Aside from these tools, it’s also possible to independently develop custom machine learning models. By utilizing training models on labeled datasets, frameworks like TensorFlow or PyTorch, these solutions can be fine-tuned to recognize specific blueprint elements and achieve higher accuracy for the needs of an organization.
Pretrained models offer convenience and ease of use but may not be as effective at interpreting engineering drawings as custom solutions. These custom solutions also require additional resources and expertise to develop and maintain.
Custom solutions require additional financial resources and labor to develop. I would recommend starting with a proof of concept (PoC) to validate technical capabilities and a minimum viable product (MVP) to check the market’s perception of the project before investing too heavily in a custom OCR solution.
The Process of Implementing an OCR Module for Reading Engineering Drawings
The best place to start building OCR software for engineering drawings would be to analyze available open-source tools. If you exhaust your open-source options, you may need to turn to closed-source options with API integrations.
Building an OCR solution from scratch is impractical because it requires a huge dataset for training. This is difficult and expensive to gather and requires a lot of resources for model training. In most cases, fine-tuning existing models should suit your needs.
The process from here looks something like this:
- Consider requirements: you need to understand what kind of engineering drawings your application should work with and what kinds of features and functionalities are needed to achieve that goal.
- Image capture and pre-processing: think about what devices you plan to use to capture the images. Extra pre-processing steps may be needed to enhance the quality of your results. This may include cropping, resizing, denoising, and more.
- OCR integration: consider the OCR engine that will work best with your application. OCR libraries have APIs that allow your application to extract text from captured images. It’s important to consider open-source OCR solutions for cost-savings. Third-party APIs can be fickle with regard to pricing over time or lose support.
- Text recognition and processing: next, it’s time to implement logic to process and recognize text. Some possible tasks you may consider adding in this step are text cleanup, language recognition, or any other techniques that can provide clearer text recognition results.
- User interface and experience: an easy-to-use UI for the app is important so that the user can effectively use it to capture images and initiate OCR. The results should be presented to the user in a way that’s easy to understand.
- Testing: thoroughly test the application to ensure its accuracy and usability. User feedback is essential to this process.
In face of the challenges of creating OCR software for complex engineering drawings, organizations have a number of options available to them to approach the issue. From a range of pre-trained models and customizable tools to create more personalized solutions, businesses can find ways to effectively analyze, index, and search through blueprints and other complex documents. All it takes is some ingenuity, creativity, and time to craft a solution that meets their needs.