SandbagTiara2816@lemmy.dbzer0.com to

Python@programming.dev · 9 months ago

Modules for extracting data from PDF?

5

19

Modules for extracting data from PDF?

SandbagTiara2816@lemmy.dbzer0.com to

Python@programming.dev · 9 months ago

5

I’m not a software developer, but I like to use Python to help speed up some of my office work. One of my regular tasks is to print a stack of ~40 sheets of paper, highlight key information for each entry (about 3 entries per page), and fill out a spreadsheet with that information that then gets loaded into our software.

This is time-consuming, and I’d like to write a program that can scan the OCR-ed PDFs and pull the relevant information into a CSV.

I’m confident I could handle it from there, but I know that PDFs are tricky files to work with. Are there any Python modules that might be a good fit for the approach I’m hoping to take here? Thanks!

Chat

milkisklim
link
fedilink
arrow-up
2·
9 months ago
From what I understand PyPDF3 and 4 are separate from pypdf which is the modern version of PyPDF2 as of last year

source link
- charolastra@lemmy.world
  link
  fedilink
  arrow-up
  2·
  9 months ago
  That’s correct afaik. The maintainers of PyPDF2 merged it back into the original pypdf for version 3 I believe.

Python@programming.dev

python@programming.dev

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !python@programming.dev

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

Past

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - “Python in HEP” Developer’s Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy’s API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

15 users / day
46 users / week
473 users / month
1.59K users / 6 months
344 local subscribers
6.24K subscribers
438 Posts
2.08K Comments
Modlog