poppler-utils and pandocReference guide: convert-pdf-to-markdown-linux
Result: poppler-utils and pandoc are installed.
Explanation:
sudo apt install poppler-utils pandoc
Result: The PDF is converted to a text file and images are extracted.
Explanation:
pdftotext -layout ../corebook.pdf corebook.txt
pdfimages -all ../corebook.pdf images/image
pdftotext -layout preserves the text layout.pdfimages -all extracts all images from the PDF into the images/ directory.pandocResult: Two Markdown files are generated (corebook.md and corebook-f.md). The difference between the two is minimal.
Explanation:
pandoc -t markdown corebook.txt -o corebook.md
pandoc -f markdown corebook.txt -o corebook-f.md
The -t and -f options produce very similar output in this case. The reformatting is minimal.
The git repository of the book was cloned afterwards to work directly from source:
git clone git@github.com:tinycorelinux/corebook.git
It is easier to work from source.
# List pandoc supported input formats
pandoc --list-input-formats | paste -sd,
biblatex,bibtex,bits,commonmark,commonmark_x,creole,csljson,csv,docbook,docx,dokuwiki,endnotexml,epub,fb2,gfm,haddock,html,ipynb,jats,jira,json,latex,man,markdown,markdown_github,markdown_mmd,markdown_phpextra,markdown_strict,mediawiki,muse,native,odt,opml,org,ris,rst,rtf,t2t,textile,tikiwiki,tsv,twiki,typst,vimwiki
# Direct PDF to Markdown conversion in one command
pdftotext input.pdf - | pandoc -t markdown -o output.md