Working with PDF Documents in Python?
First of all We need to understand that what is a PDF Document.It's not a new term to you I think but may be for someone it's a new topic.The PDF acronym stands for Portable Document Format.It is simply a format for Sharing Contents Digitally.Now how can we work with PDF files in Python.
Since Python is a very versatile Programming Language and to deal with PDF Documents it provides a Module called PyPDF2.This Module is purely written in Python Programming Language.
Installing PyPDF2 Module
C:\Users\Your Name>pip install pypdf2
Extracting Metadata
PDF Metadata is basically the data that provides more information about a certain PDF file. PDF metadata often includes information like - creation date, author, capacity and application that created the files.
EXAMPLE
n =
pdfMeta(n)
Extracting Text
Now We will see how to Extract Text From a PDF File using PyPDF2 Module.
pdf = PdfFileReader(file)
page = pdf.getPage(10)
text = page.extractText()
path =
pdfText(path)
Thanks for Reading
No comments: