Remove new lines from a text file
Posted: 2019-10-08 Filed under: office | Tags: line, sed, space, tr Leave a commentI want to copy some text from a PDF file into a word processor. Copying works, but the line breaks from the PDF formatting are also copied. This messes up the text and results in disconnected text with many rows. I decided to use the command line to fix this automatically. I searched a bit and it seems that there are two steps that have to be taken.
First, I copied the text from the PDF and save it as in a simple text file, let’s name it something like lines.txt
.
As it is described in Post 5 part of this discussion, let’s add an empty space at the end of each line and output it in a new file:
sed 's/$/ /' ./lines.txt > linesSpace.txt
Then, lets remove the newlines from there, as it is suggested in this post:
tr -d '\n' < linesSpace.txt > noNewLines.txt
If there were no spaces inserted after the last word at each line, removing the line breaks will merge the last word with the first word from the row below. I decided to write myself a simple script that combines these:
#!/bin/bash echo "Enter full path to your input file:" read INPUT echo "What will your output file be? Output:" read OUTPUT # Add space to the end of each row sed 's/$/ /' $INPUT | tr -d '\n' > $OUTPUT
That’s it!