Skip to content

Skills To Get Ahead In Your Career

getaheadskills.com

Menu
  • Home
  • Career Development
  • Career Paths
  • Finding job
  • Interviewing
  • Pay and Salary
  • Start new job
Menu

How To Work With PDFs in Python for Beginner Programmers

Posted on July 21, 2022July 22, 2022 by Arthur Torres

Python is an instinctive programs tool that provides a varied variety of features to assist developers write and edit code. When programming with Python, you can utilize different libraries and functions to customize PDF files such as watermarking and highlighting sections of a document. Learning these modification strategies can assist you individualize your files to satisfy your specific requirements and requirements.In this article, we go over 9 methods you can use PDFs in Python to help you improve your programming skills for your career.How to deal with PDFs in Python Here are nine unique ways you can deal with PDF files in Python:1.

Contents

  • Extract and include text To extract text from Python, set up PDFMiner by running the command”pip install pdfminer.six”. As soon as the library installs, you can add and extract text from a PDF. Here’s
    • an example revealing the syntax for extracting text from a PDF in Python utilizing this library: from io import StringIO from pdfminer.pdfinterp import PDFPageInterpreter, PDFResourceManager from pdfminer.pdfpage import PDFPage from pdfminer.converter import TextConverter from pdfminer.layout import LAParams # PDFMiner Analyzers rsrcmgr=PDFResourceManager()sio
    • 3. Merge PDFs
    • page)) output=f’p>
    • 6. Encrypt a PDF
    • 7. Extract and add images
    • 9. Highlight text To highlight text on your PDF, download the Fitz library to help you accomplish this job. After you install the Fitz library, you’re ready to begin. Here’s an example syntax showing you how to highlight text:

Extract and include text To extract text from Python, set up PDFMiner by running the command”pip install pdfminer.six”. As soon as the library installs, you can add and extract text from a PDF. Here’s

an example revealing the syntax for extracting text from a PDF in Python utilizing this library: from io import StringIO from pdfminer.pdfinterp import PDFPageInterpreter, PDFResourceManager from pdfminer.pdfpage import PDFPage from pdfminer.converter import TextConverter from pdfminer.layout import LAParams # PDFMiner Analyzers rsrcmgr=PDFResourceManager()sio

=StringIO

()codec=”utf-8″laparams =LAParams()gadget=TextConverter(rsrcmgr, sio, codec
=codec, laparams=laparams)interpreter

=PDFPageInterpreter(rsrcmgr, device)# course to our input file pdf_file=”sample.pdf”# Extract text pdfFile=open(
pdf_file,” rb”)for page in
PDFPage.get _ pages(pdfFile): interpreter.process _
page( page)fp.close
()# Return text from StringIO
text = sio.getvalue()print(text)# Freeing Up device.close()

sio.close() Here’s an example syntax for adding text to your PDF file: from reportlab.lib.pagesizes import
LETTER from reportlab.lib.units import inch from reportlab.pdfgen.canvas import Canvas from reportlab.lib.colors import red # developing the pdf file canvas
=Canvas(“text_file. pdf”, pagesize=LETTER)

# setting up the font and the font size canvas.setFont(“

Courier”, 16

) # establishing

the color of the font as red canvas.setFillColor(red) # composing this text on the PDF file canvas.drawString(2 * inch, 8 * inch, “This is a freshly produced Python PDF.”)canvas.save()

Related:. css-1v152rs border-radius:0; color: # 2557a7; font-family:”Noto Sans”,
“Helvetica Neue”,”Helvetica”, “Arial “,”Freedom Sans”,”Roboto”,”Noto”, sans-serif;-webkit-text-decoration: none; text-decoration
: none; – webkit-transition: border-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), background-color

READ MORE  Training for Program Managers: Definition and Types

200ms cubic-bezier(0.645, 0.045, 0.355, 1), opacity 200ms cubic-bezier (0.645, 0.045, 0.355, 1), border-bottom-color 200ms
cubic-bezier(0.645, 0.045, 0.355, 1)

, border-bottom-style 200ms cubic-bezier (0.645, 0.045, 0.355, 1), border-bottom-width 200ms cubic-bezier(

# rotate_pages. py

from PyPDF2 import PdfFileReader, PdfFileWriter

def rotate_pages(pdf_path):
pdf_writer = PdfFileWriter()
pdf_reader = PdfFileReader(pdf_path)
# Turn page 90 degrees to the right
page_1 = pdf_reader. getPage( 0 ). rotateClockwise( 90 )
pdf_writer. addPage(page_1)
# Turn page 90 degrees to the left
page_2 = pdf_reader. getPage( 1 ). rotateCounterClockwise( 90 )
pdf_writer. addPage(page_2)
# Include a page in normal orientation
pdf_writer. addPage(pdf_reader. getPage( 2 ))

with open(‘rotate_pages. pdf’, ‘wb’) as fh:
pdf_writer. write(fh)

if __ name __ == ‘__ main __’:
path=’Jupyter_Notebook_An_Introduction. pdf’
rotate_pages(course)

3. Merge PDFs

There are numerous scenarios when you might wish to merge several PDFs into a single PDF. For instance, you can merge a standard cover page with numerous PDF reports. Python permits you to achieve that task by using the “merge_pdfs()” function. Here’s an example of the syntax for this function:

# pdf_merging. py

from PyPDF2 import PdfFileReader, PdfFileWriter

def merge_pdfs(courses, output):
pdf_writer = PdfFileWriter()

for course in courses:
pdf_reader = PdfFileReader(course)
for page in range(pdf_reader. getNumPages()):
# Add each page to the writer things
pdf_writer. addPage(pdf_reader. getPage(page))

# Draw up the merged PDF
with open(output, ‘wb’) as out:
pdf_writer. write(out)

if __ name __ == ‘__ primary __’:
courses = [‘ document1.pdf’, ‘document2.pdf’] merge_pdfs(courses, output=’merged.pdf’)

Read more: . css-1v152rs border-radius:0; color: # 2557a7; font-family:”Noto Sans”,”Helvetica Neue”,”Helvetica”,”Arial”,”Freedom Sans”,”Roboto”,”Noto”, sans-serif;-webkit-text-decoration: none; text-decoration: none;-webkit-transition: border-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), background-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), opacity 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-bottom-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-bottom-style 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-bottom-width 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-radius 200ms cubic-bezier(0.645, 0.045, 0.355, 1), box-shadow 200ms cubic-bezier(0.645, 0.045, 0.355, 1), color 200ms cubic-bezier(0.645, 0.045, 0.355, 1); shift: border-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), background-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), opacity 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-bottom-color 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-bottom-style 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-bottom-width 200ms cubic-bezier(0.645, 0.045, 0.355, 1), border-radius 200ms cubic-bezier(0.645, 0.045, 0.355, 1), box-shadow 200ms cubic-bezier(0.645, 0.045, 0.355, 1), color 200ms cubic-bezier(0.645, 0.045, 0.355, 1); border-bottom:1 px solid; cursor: tip;. css-1v152rs: hover. css-1v152rs: active color: # 0d2d5e;. css-1v152rs: focus overview: none; border-bottom:1 px strong; border-bottom-color: transparent; border-radius:4 px; box-shadow:0 0 0 1px;. css-1v152rs: focus: not( [data-focus-visible-added]. css-1v152rs: hover,.css-1v152rs: active color: # 164081;. css-1v152rs: gone to color: # 2557a7; @media (prefers-reduced-motion: lower). css-1v152rs -webkit-transition: none; transition: none;. css-1v152rs: focus: active: not( [data-focus-visible-added] box-shadow: none; border-bottom:1 px strong; border-radius:0; How To Integrate PDF Files (With Advantages and Tips). css-r5jz5s width:1.5 rem; height:1.5 rapid eye movement; color: inherit; display screen:-webkit-inline-box; display:-webkit-inline-flex; screen:-ms-inline-flexbox; display screen: inline-flex;-webkit-flex:0 0 vehicle;-ms-flex:0 0 car; flex:0 0 car; height:1 em; width:1 em; margin:0 0 0.25 rem 0.25 rem; vertical-align: middle;

READ MORE  How To Bid on Concrete Jobs To Get the Contract

2 0 00-2 2v13.996 a2 2 0 001.996 2.004 h14a2 2

0 002-2v-6.5 a. 5.5 0 00 -.5 -.5 h-1a.5.5 0 00 -.5.5 v6.5 L5 18.998 V5.002 L11.5 5a.495.495 0 00.496 -.498 v-1a.5.5 0 00 -.5
-.5 H5z”> 4. Split PDFs Instead of merging PDFs together, you can split one
document into multiple PDF files. This action is useful when you want to create numerous PDFs to share a considerable amount of info or content. Here’s an example of the syntax showing how to split PDF files with the PyPDF2 library: # pdf_splitting. py from PyPDF2 import PdfFileReader
, PdfFileWriter def split(path,

name_of_split): pdf =PdfFileReader
(path) for page in variety(pdf.getNumPages()): pdf_writer=PdfFileWriter()pdf_writer. addPage(pdf.getPage(

page)) output=f’p>

. name_of_split page. pdf’with open (output, ‘wb’) as output_pdf: pdf_writer. write(output_pdf)if __ name __ ==’ __ primary __’: path=’Jupyter_Notebook_An_Introduction. pdf’split(course, ‘jupyter_page’)5. Add watermarks Watermarks are patterns and images you include on document pages to make them identifiable. You can add watermarks with the “develop _ watermark “function. This function accepts three arguments, including”input _ pdf”for the file course you want to watermark,” output” for the course where you wish to conserve your watermarked PDF version and the watermark image

or text itself. Here’s
an example of the syntax: # pdf_watermarker. py from PyPDF2 import PdfFileWriter, PdfFileReader def create_watermark(input_pdf, output,
watermark): watermark_obj=PdfFileReader(watermark)watermark_page=watermark_obj. getPage( 0 )

pdf_reader = PdfFileReader(input_pdf)
pdf_writer = PdfFileWriter()

# Watermark all the pages
for page in variety(pdf_reader. getNumPages()):
page = pdf_reader. getPage(page)
page.mergePage(watermark_page)
pdf_writer. addPage(page)

with open(output, ‘wb’) as out:
pdf_writer. compose(out)

if __ name __ == ‘__ main __’:
create_watermark(
input_pdf=’Jupyter_Notebook_An_Introduction. pdf’,
output=’watermarked_notebook. pdf’,
watermark=’watermark.pdf’)

6. Encrypt a PDF

Using the PyPDF2 library, you can encrypt your PDF file with a password. While you can’t set authorizations on the file, you can add an owner password and give yourself administrator opportunities for the entire PDF. Here’s an example of the syntax using the “include_encryption” and “. encrypt()” functions:

# pdf_encrypt. py

from PyPDF2 import PdfFileWriter, PdfFileReader

def add_encryption(input_pdf, output_pdf, password):
pdf_writer = PdfFileWriter()
pdf_reader = PdfFileReader(input_pdf)

for page in range(pdf_reader. getNumPages()):
pdf_writer. addPage(pdf_reader. getPage(page))

pdf_writer. secure(user_pwd=password, owner_pwd=None,
use_128bit=True)

with open(output_pdf, ‘wb’) as fh:
pdf_writer. compose(fh)

if __ name __ == ‘__ primary __’:
add_encryption(input_pdf=’reportlab-sample. pdf’,
output_pdf=’reportlab-encrypted. pdf’,
password=’twofish’

7. Extract and add images

You can draw out images from a PDF file using the PyMuPDF and Fitz libraries. You can install these libraries by inputting “pip install pymupdf” and “pip set up fitz” into the command line. Here’s an example syntax revealing you how to draw out images:

# PyMuPDF
import fitz
import io
from PIL import Image

READ MORE  31 Competitor Product Analysis Tools for Marketing Teams

# course to our input file
pdf_file=”sample.pdf”

# Input PDF file
pdf_file = fitz.open(pdf_file)

for page_no in range(len(pdf_file)):
curr_page = pdf_file [page_no] images = curr_page. getImageList()

for image_no, image in enumerate(curr_page. getImageList()):
# get the XREF of the image
xref = image [0] # extract the image bytes
curr_image = pdf_file. extractImage(xref)
img_bytes = curr_image [” image”] # get the image extension
img_extension = curr_image [” ext”] # load it to PIL
image = Image.open(io.BytesIO(img_bytes))
# save it to local disk
image.save(open(f”page _ img image_no. “, “wb”))

Here’s an example of the syntax for adding an image to a PDF:

from reportlab.lib.pagesizes import LETTER
from reportlab.pdfgen.canvas import Canvas

canvas = Canvas(“add_image. pdf”, pagesize=LETTER)
canvas.drawInlineImage(“x.jpeg”, 100, 450)

canvas.save()

Related: . css-1v152rs. css-1v152rs: hover. css-1v152rs: active color: # 0d2d5e;. css-1v152rs: focus. css-1v152rs: focus: not( [data-focus-visible-added] box-shadow: none; border-bottom:1 px strong; border-radius:0;. css-1v152rs: hover,.css-1v152rs: active. css-1v152rs: visited color: # 2557a7; @media (prefers-reduced-motion: minimize). css-1v152rs: focus: active: not( [data-focus-visible-added] box-shadow: none; border-bottom:1 px strong; border-radius:0; Leading 49 Python Interview Questions in 2021 (With Example Responses). css-r5jz5s

.996 a2 2 0 001.996 2.004 h14a2 2 0 002-2v-6.5 a. 5.5 0 00 -.5 -.5 h-1a.5.5 0 00 -.5.5 v6.5 L5 18.998 V5.002
L11.5 5a.495.495 0 00.496 -.498 v-1a.5.5

0 00 -.5 -.5 H5z”> 8. Extract URLs If you want to extract URLs from your PDF file, Python uses a pdfx library you can utilize to achieve that task. Install the pdfx module by inputting the command”pip set up pdfx”. After setting up the library, you’re ready to carry out the extraction. Here’s an example of the syntax demonstrating how to draw out URLs: import pdfx # reading the PDF File pdf=pdfx.PDFx(“sample.pdf”)# get list of URLS print(pdf.get _ references_as_dict())Related:. css-1v152rs. css-1v152rs: hover. css-1v152rs: active color: # 0d2d5e;. css-1v152rs: focus. css-1v152rs: focus: not([ data-focus-visible-added]. css-1v152rs: hover,.css-1v152rs: active color: # 164081;. css-1v152rs: visited color: # 2557a7; @media(prefers-reduced-motion: lower). css-1v152rs: focus: active: not( [data-focus-visible-added] box-shadow: none; border-bottom:1 px solid; border-radius:0; Python Designer Abilities: Meaning and Examples. css-r5jz5s width:1.5 rapid eye movement; height:1.5 rapid eye movement; color: inherit; display screen:-webkit-inline-box; screen:-webkit-inline-flex; screen:-ms-inline-flexbox; screen: inline-flex;-webkit-flex:0 0 car;-ms-flex:0 0 vehicle; flex:0 0 vehicle; height:1 em; width:1 em; margin:0 0 0.25 rem 0.25 rapid eye movement; vertical-align: middle;

9. Highlight text To highlight text on your PDF, download the Fitz library to help you accomplish this job. After you install the Fitz library, you’re ready to begin. Here’s an example syntax showing you how to highlight text:

import fitz

# opening the pdf file
pdf_file = fitz.open(“sample.pdf”)

# input text to be highlighted
text=”Recommender”

# repeating through pages for highlighting the input expression
for page in pdf_file:
match_words = page.searchFor(text)

for word in match_words:
highlight = page.addHighlightAnnot(word)
highlight.update()

# conserving the pdf file as highlighted.pdf
pdf_file. save(“highlighted.pdf”)

Please note that the company mentioned in this article is not associated with Indeed.

Recent Posts

  • 6 Interview Questions and Answers for Assistant Principals
  • Scaling Agile: Definition and How It Works (With Tips)
  • Pros and Cons of Staying for a Long Time With One Employer
  • 35 Entry-Level Civil Engineering Interview Questions
  • How To Become a Substitute Teacher in Illinois in 7 Steps

Categories

  • Career Development
  • Career Paths
  • Finding job
  • Interviewing
  • Pay and Salary
  • Start new job

Introduce

  • About Us
  • Privacy Policy
  • Contact