OCR for construction documents does not work, we fixed it

(getanchorgrid.com)

120 points | by wcisco17 9 hours ago

21 comments

Terr_ 7 hours ago
> OCR for construction documents does not work
I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]
It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.
[0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s
[-]
- TehCorwiz 7 hours ago
  If I recall it was an artifact of the compression algo.
  Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
- hackcasual 4 hours ago
  JBIG2 does glyph binning, as you say not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance.
  [-]
  - thaumasiotes 3 hours ago
    > not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance.
    How can we describe OCR that wouldn't match this definition exactly?
    [-]
    - yuliyp 1 hour ago
      Glyph binning looks for any chunks in the image that are similar to eachother, regardless of what they are. Letters, eyeballs, pennies, triangles, etc without caring what it is. OCR looks specifically to try and identify characters (i.e. it starts with a knowledge of an alphabet, then looks for things in the image that look like those.
      If the image is actually text, both of them can end up finding things. Binning will identify "these things look almost the same", while OCR will identify "these look like the letter M"
    - Dylan16807 2 hours ago
      Jbig2 dynamically pulls reference chunks out of the image, which makes it more likely to have insufficient separation between the target shapes.
      It also gives a false sense of security when it displays dirty pixels that still clearly show a specific digit, since you think you're basically looking at the original.
      [-]
      - thaumasiotes 2 hours ago
        That's a description of Jbig2, not a description of OCR.
        Jbig2 is an OCR algorithm that doesn't assume the document comes from a pre-existing alphabet.
        [-]
        Dylan16807 2 hours ago
        You asked what the difference was, and I said the difference. Was it unclear that to fit the phrasing of your question, we add "OCR doesn't"? I would not personally call Jbig2 OCR.
        [-]
        thaumasiotes 1 hour ago
        > You asked what the difference was, and I said the difference.
        Take another look at my comment.
        [-]
        Dylan16807 25 minutes ago
        Let me try rephrasing to make the response to your original comment as clear as possible.
        Question: "How can we describe OCR that wouldn't match this definition exactly?"
        Answer: This definition largely fits OCR, but "reference to a single instance" is a weird way to phrase it. A better definition of OCR would include how it uses builtin knowledge of glyphs and text structure, unlike JBIG2 which looks for examples dynamically. And that difference in technique gives you a significant difference in the end results.
        Is that better?
        The definition you quoted is not an "exact" fit to OCR, it's a mildly misleading fit to OCR, and clearing up the misleading part makes it no longer fit both.
h317 4 hours ago
I cannot wait for the day when tech companies become players in the construction industry because it looks like it is the only way forward to make a change.
To think that everything has been digitalized a long time ago, yet contract law cannot properly deal with delineating responsibilities between GC and Architects, who are still sending 2D drawings to each other.
Imagine, all this information about quantities and door types (and everything else) is already available and produced by the architect's team, BUT they cannot share it! Because if they do, they are responsible for the numbers in case something is wrong.
So now there is this circus of: Arch technologist making the base drawing with doors. GC receives documents, counts doors for verification, and sends them to the sub. Subcontractor looks at these drawings, counts them again, and sends data to the supplier. Guess what, the supplier also looks, counts, confirms, and back we go.
Though I think robotics will change all of that. And when we have some sort of bot assistance, big tech players will have a bigger leverage in this, which will lead to the proper change management architecture.
Anyway, cool product. Anything to help with estimation. Really hope it gets traction.
[-]
- punnerud 4 hours ago
  I had a job as HVAC engineer for the upgraded Oslo Airport back in 2011; started doing HVAC work for 3 weeks the rest was programming trying to make the rest of people more efficient. Made an Excel sheet with a lot of macros to manage all the drawing of the airport. That’s why I switched to programming when I continued to study, and did not want to come back before I got more experience.
  They even gave me a big desk at Trondheim/Tyholt so I could help them with the software during my studies.
  [-]
  - ajcp 2 hours ago
    For what it's worth I found Oslo Airport to be one of the best airport experiences I've ever had. 5 stars.
- alexeischiopu 35 minutes ago
  I couldn’t agree more. The fact this data isn’t programmable is really holding the industry behind.
  When building PlanGrid there were so many things we wished we could have done had this been unlocked.
  I’m now working on doing just that.
- wcisco17 3 hours ago
  Hey thanks!! For estimation we cover division 08 (doors and opening) you can use it for your estimating purposes with these two endpoints:
  - Counting all the doors: https://www.getanchorgrid.com/developer/docs/endpoints/drawi... - Extracting schedules in architectural drawings: https://www.getanchorgrid.com/developer/docs/endpoints/drawi...
  and use Claude or any other AI tool to wire up the UI
  We're releasing toilets (division 10) later this week, then floors and pipes next.
- ndespres 3 hours ago
  I’ve worked on projects where a lot of work was done in highly collaborative drawings on Bluebeam, in which vendors add their markups and items and the program facilitates counting it all at the end of the phase. My role was only in things like wireless AP placement and low voltage cabling drop locations, not anything safety critical like doors, but I assume those vendors were able to keep track of those items in a similar way. For actual engineering projects I’m glad so many people have to take the time to count.
sreekanth850 6 hours ago
We’re taking a different path, building a parsing engine that converts CAD (DWG/DXF) into fully structured JSON with preserved semantics (no ML in the critical path).We also have a separate GIS parser that extracts vector data (features, layers, geometries) independently, Like to know how you handle consistency and reproducibility across runs using models and how you make it affordable, especially at scale. because as far as i know CAD and GIS need precision and accuracy.
[-]
- wcisco17 5 hours ago
  interesting yeah parsing DWG/DXF natively makes sense when the source file is clean and well-structured. The precision argument is valid in controlled environments.
  The challenge we kept running into is that construction drawings in the wild aren’t always that clean. Unresolved xrefs, exploded dynamic blocks, version incompatibilities, SHX font substitutions — by the time a PDF hits a GC’s desk it’s often the only reliable artifact left. The CAD source may not even be available.
  That’s why we see vision becomes the more pragmatic path — not because it’s more precise than structured CAD parsing, but because PDFs are the actual lingua franca of construction. Every firm, every trade, every discipline hands off PDFs. So we made a bet on meeting the document where it actually lives.
  On consistency and reproducibility — that’s a real challenge with vision models. Our approach is to keep detection scope narrow and validate confidence scores on every output rather than trying to generalize broadly. Happy to go deeper on that if useful.
- runxel 4 hours ago
  Dumbcad line barf will not help you with that at all.
  There already is a format that is plain text and preserves the semantics: IFC. That's what it was made for.
- oneneptune 5 hours ago
  Is this a service / product you plan to offer outwardly? I'd be interested in learning more. Use case: estimation.
copypaper 52 minutes ago
Very interesting. Im on vacation but will check this out at work next week.
What is the maximum resolution you support for PDFs? The max gemini will do is 3072x3072. We have plans that are 10x that size.
[-]
- alexeischiopu 34 minutes ago
  What divisions information do you need?
  [-]
  - copypaper 25 minutes ago
    Mainly 09, but also 05 and 07.
petee 5 hours ago
I ran the example doors given and it missed 9 swinging doors, some that were in double swing pairs, and a few that were just out on their own not clustered. Not bad overall though
[-]
- wcisco17 3 hours ago
  Yep we're constantly improving we're currently above 0.87 for doors
  we're thinking of adding a params for the ROC curve so that you can decide your own optimal thresholds depend on when false positive true positive rate is acceptable
  [-]
  - petee 3 hours ago
    Oh nice! I spend a good amount of time eyeballing drawings for overlooked details, so even finding most is a handy tool to me as my brain can skip the marked areas
nostrapollo 3 hours ago
First off, congrats on the launch! Construction is a tough market to build in. My personal view after being in it a for a few years is that there is no shortage of MVPs. In fact there is an MVP for every problem at every level (or at least it feels that way) but construction is /vast/ and the rough edges that seem juicy at first, in practice are optimizations rather than bottlenecks for constructors.
I hope you succeed because it would be great to have a standard API for this data, but I would advise on one of two directions: become the standard by being close to 100% accurate at finding symbols (one symbol doesn't seem to cut it in our testing) or make a great, comprehensive workflow for a small subset of the market and become standard that way.
In both cases, you cannot do a broad 'market test', you need to spend many hours with a specific sub-set of users in construction.
Disclaimer: I'm a co-founder of Provision.
tomedwrds 3 hours ago
I have been working on an extension of this problem lately that involves extracting all doors + any details about those doors to produce quotes. I have found giving the pdf to codex pretty good at it as it can take subcrops of the plans to look at certain areas of high noise in more detail. Only downside is cost is quite high.
frogguy 6 hours ago
Looks cool! Where are you getting the data to finetune the cv models for element extraction? I'm worried there isn't a robust enough dataset to be able to build a detection model that will generalize to all of the slightly different standards each discipline (and each firm for that matter) use.
[-]
- wcisco17 6 hours ago
  good q — we don't train on customer drawings. Our detection models are trained on a curated dataset of architectural drawings we've sourced and labeled ourselves, focused on the most common fixture and element types across CSI divisions.
  The generalization problem you're pointing at is real and it's the hardest part of this. Our approach is to keep the detection scope tight — rather than trying to generalize across every firm's conventions, we train on a small but high-quality set of fixtures and optimize for precision within that scope.
  The result is high confidence outputs on the elements we support, rather than mediocre coverage across everything.
  We're expanding the detection surface incrementally as we validate accuracy division by division!
  [-]
  - dylan604 6 hours ago
    How in the world is an answer to a question from the account posting TFA replying directly to said question getting killed?
punnerud 4 hours ago
«Why we did it»; would rather have a “How we did it”. The why gave me AI generated marketing material feelings.
Tailscale’s article about NAT traversal is an example of how to write “how we did it”: https://tailscale.com/blog/how-nat-traversal-works
[-]
- wcisco17 3 hours ago
  Fair point and thanks for the tip. No AI here though.
mmethodz 2 hours ago
Do that for Finnish construction documents. My parser is 30000+ lines candidate based but the lack of standards and the Finnish language...
[-]
- wcisco17 1 hour ago
  Have you tried using it with Finnish construction documents? It should work for the detections that cover drawings.
testUser1228 7 hours ago
What do you foresee being the end use case for this (or most valuable use case)?
[-]
- wcisco17 7 hours ago
  Anyone building in or for construction tech — whether that's a startup building estimating or project management software, a construction company with an internal tech team solving this themselves, or a builder looking to automate their workflow. The common thread is drawings. Every one of those groups lives and dies by their ability to extract actionable data from a PDF that was never designed to be machine-readable. We're building the layer that makes that possible so they don't have to start from scratch.
  [-]
  - wang_li 7 hours ago
    Why does the workflow lie at the level of a real or virtual piece of paper and not in the metadata from the applications used to create that piece of paper? Seems like a CAD tool would allow you to identify each element of the drawing, assigning metadata as required.
    [-]
    - jsidney 7 hours ago
      Only a small set of construction stakeholders participate in the CAD ecosystem (e.g., architects, large GCs) while a broader set of stakeholders (subcontractors, trades, smaller GCs/CMs) do not receive BIM files and work with PDFs. CAD/BIM is a wonderful aspiration but for many the reality is PDFs.
      [-]
      - instig007 6 hours ago
        Re. "CAD/BIM", technically speaking CAD doesn't imply BIM, and the industry's promotion of BIM is akin to AI promotion among software engineering teams - the benefits aren't clear upon detailed review of the advertised capabilities. The CAD part, on the other hand, is generally recognized as the essential tooling for the profession and I'm surprised to hear that it just is a "wonderful aspiration".
        [-]
        eMPee584 3 hours ago
        "The profession" actually is a wide variety of trades, not just architects and contractors. Electricians, plumbers etc. where CAD is not yet widely spread. Which hopefully will change in the near future, with open source BIM tool chains, boosted by generative/agentic AI.. Finally, a huge source of confusion and execution hiccups will be overcome.
        [-]
        wcisco17 3 hours ago
        Until then pdf rules!!!
    - cyanydeez 7 hours ago
      Oh you sweet summer child. These draws are anywhere from 0 to 120 years old and might just be something pulled out of a floppy disk from 1970 to scanned in coffee ridden pieces of paper sitting in a desk folded a hundred times.
      The world in which metadata is a common thing attached to any file doesn't exist, and probably never will, no matter how much you try to improve CAD work flow.
      [-]
      - siriusfeynman 3 hours ago
        > Oh you sweet summer child
        I know you're just repeating a phrase from a TV show but do you know how incredibly condescending this comes across to most people?
Iulioh 8 hours ago
When will this be available for 30000x8000px electrical diagrams?
I have to make a BOM and oh boy I hate my job
[-]
- oritron 8 hours ago
  What software made the bitmap? Seems like a step earlier in the pipeline could help generate a BOM more easily.
  [-]
  - Iulioh 7 hours ago
    I'm not really sure and I don't have access to it, I just recive flat PDFs or TIFFs
    A lot of them are "archival" so I'm pretty OOL
    [-]
    - dylan604 5 hours ago
      You might even be SOL
      It is telling that so many of the comments here assume the person with a thing that is not the most practical would be easily able to request thing in a different format. The assumption that the person with the inconvenient thing would never have thought to ask if more convenient thing was available and just willfully toiling with the inconvenient thing is kind of insulting.
      [-]
      - oneneptune 5 hours ago
        Also, in the construction industry you get an updated drawing file a day before the bidding closes... good luck getting the GC to send more detailed files (that they themselves got elsewhere) in that time. You're better off sending it to your estimation department in India and letting them work through the night to put together the new estimations.
- alexeischiopu 7 hours ago
  I’m building a similar platform, with electrical being furthest ahead - SLD, panels, lights, power, comms.
  Also do doors, windows, and mechanical equipment.
  dm, and I can include you in the next preview.
  [-]
  - testUser1228 6 hours ago
    I'm not sure how to dm on here, but I'm very interested
    [-]
    - axus 5 hours ago
      You can paste "who is alexeischiopu" to a search engine, and since there isn't an athlete with the same name, a good candidate appears.
  - Iulioh 7 hours ago
    I work in the automotive field, I don't know if this complicates the things further but I appreciate any help!
- jsidney 8 hours ago
  What do you hate the most?
  [-]
  - stronglikedan 6 hours ago
    silly questions
hspraggins77 7 hours ago
Great points raised!
alexeischiopu 7 hours ago
Good idea :)
[-]
- wcisco17 7 hours ago
  Thanks!!
vessenes 7 hours ago
cool. What's pricing like?
[-]
- wcisco17 7 hours ago
  Thanks! https://www.getanchorgrid.com/developer/pricing
  Let me know if you find it useful or have any questions, happy to help.
  [-]
  - vessenes 7 hours ago
    Thanks -- btw the Pricing link on the site pulls up a form, not that page.
achillesheels 8 hours ago
Love it! Starbucks Vente Machiato sip
Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…
[-]
- wcisco17 7 hours ago
  Hey OP here - Love to help if you're looking for a team to implement a solution.
  https://cal.com/anchorgrid/anchorgrid-external-meeting?durat...
i18nagentai 6 hours ago
[flagged]
[-]
- wcisco17 4 hours ago
  Yeah OCR for technical is largely solved we're targeting the construction documents space For construction specs we have a few endpoints: - https://www.getanchorgrid.com/developer/docs/endpoints/specs... - https://www.getanchorgrid.com/developer/docs/endpoints/specs...
  so you would want these documents translated lets say to German, mandarin, ect?
ware-intel 7 hours ago
Your smart features looks like a game changer? Nice job!
[-]
- jsidney 3 hours ago
  Thanks!!
fithisux 8 hours ago
Of course it is not working. PDF and images are supposed to be tamper resistant. OCR tries to reverse engineer them.
[-]
- kube-system 8 hours ago
  Since when is tamper resistance a part of PDF or any common image format?
  [-]
  - pwagland 7 hours ago
    PDF files can be signed, that is tamper resistance. Tamper resistance doesn't have to make any difference to the readability of the document.
    [-]
    - kube-system 7 hours ago
      So can any type of file -- that doesn't have any relevance to the supposed design of every file type in existence. Now, later versions of PDF do have explicit support for signatures, but what does this have to do with preventing OCR? OCR reads a file, it doesn't change the original file.
      [-]
      - ranger_danger 7 hours ago
        Some OCR solutions do change the original file, like OCRmyPDF. They take layers that were just images before and replace it with text layers so that you can search the document.
        [-]
        kube-system 7 hours ago
        That isn't OCR, but an application of the resulting output of OCR. Again, a signature on a PDF or any type of file doesn't prevent you from reading it. (It also doesn't technically prevent you from changing it, it just enables the detection of changes to a particular file.)
        There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format
      - fithisux 7 hours ago
        True but you can make modified copies if you reverse engineer it with OCR.
        [-]
        jimjimjim 2 hours ago
        That's not really what I would call reverse engineering. If you read a pdf, and type it into word is that reverse engineering? Either way whatever you get is in no way going to convince anybody that it is the original.
    - ranger_danger 7 hours ago
      Can't one just remove the signature and re-sign it with anything else after tampering? Who verifies PDFs that hard?
      [-]
      - kube-system 7 hours ago
        If you're performing OCR, you're almost by definition, disregarding the source file. The whole point of OCR is to be transformative.
  - fithisux 7 hours ago
    You can't change a PDF, it is by design to be not easy to OCRed
    [-]
    - kube-system 6 hours ago
      PDFs are merely an collection of objects, that can be plainly read by reading the file -- some of those are straight up plain text that doesn't even need to be OCR'd, it can be simply extracted. It is also possible to embed image objects in PDFs, (this is common for scanned files) which might be what you are thinking of. But this is not a design feature of PDF, but rather the output format of a scanner: an image. Editing PDFs is a simple matter of simply editing a file, which you can do plainly as you would any other.
    - jimjimjim 2 hours ago
      It is not by design! PDFs that are made from scanned documents or collections of images would require OCRing but that is true of any format that the scans/images are put into. These days the vast majority of PDFs do not need to be OCRed as the pages are just made up of text, line drawings and images. And although it can get tricky you can edit those text, line and image commands as much as you want.
      For example: add this is in the contents stream for a pdf page and it'll put hello world on the page
      BT /myfont 50 Tf 100 200 Td (Hello World) Tj ET
      (Note: a bit more is required to select the font etc)