1 comments

  • taxedo 4 hours ago
    I got tired of manually copying numbers from Form 16 PDFs into India’s tax filing portal every year. So I built *Form16x*, a Python CLI + library that turns these semi-structured PDFs into structured JSON.

    Beyond extraction, it can: - Consolidate multiple Form 16s (useful if you switched jobs in a year) - Calculate taxes under both regimes and recommend the better option - Show detailed salary and deduction breakdowns in the terminal (tree view, colored output) - Suggest tax optimizations (80C, 80D, NPS, etc.) with potential savings - Expose a Python API (`TaxCalculationAPI`) with multi-year tax rules (AY 2020–2025)

    *Repo:* https://github.com/ri-sh/Form16x

    ### Why I built it Form 16 is similar to a W-2 in the US or a T4 in Canada: a PDF tax certificate with inconsistent layouts across employers. Filing returns often means manually re-entering data, which is error-prone and time-consuming.

    Form16x tries to solve this with: - PDF parsing using camelot/pdfplumber with fallback logic - Structured output aligned with the form fields - Local-only processing (no data leaves your machine) - CLI polish (progress bars, colored display, breakdown trees)

    Would love feedback from HN on both the technical side (PDF parsing + structured extraction) and whether this approach could extend to other countries’ tax forms.