Remove new lines from a text file

Posted: 2019-10-08 | Author: slackalaxy | Filed under: office | Tags: line, sed, space, tr |Leave a comment

I want to copy some text from a PDF file into a word processor. Copying works, but the line breaks from the PDF formatting are also copied. This messes up the text and results in disconnected text with many rows. I decided to use the command line to fix this automatically. I searched a bit and it seems that there are two steps that have to be taken.

First, I copied the text from the PDF and save it as in a simple text file, let’s name it something like lines.txt.

As it is described in Post 5 part of this discussion, let’s add an empty space at the end of each line and output it in a new file:

sed 's/$/ /' ./lines.txt > linesSpace.txt

Then, lets remove the newlines from there, as it is suggested in this post:

tr -d '\n' < linesSpace.txt > noNewLines.txt

If there were no spaces inserted after the last word at each line, removing the line breaks will merge the last word with the first word from the row below. I decided to write myself a simple script that combines these:

#!/bin/bash

echo "Enter full path to your input file:"
read INPUT

echo "What will your output file be? Output:"
read OUTPUT

# Add space to the end of each row
sed 's/$/ /' $INPUT | tr -d '\n' > $OUTPUT

That’s it!

Slackalaxy

tales

Remove new lines from a text file

Leave a comment

Posts by category

Recently viewed

By date

Slackalaxy

tales

Remove new lines from a text file

Related

Leave a comment

Posts by category

Recently viewed

By date