List of useful commands

Here is a list of commands that I use on a regular basis to explore texts:

grep -Ir –exclude=”*\.svn*” “pattern” *
It searches recursively, ignores binary files, and doesn’t look inside Subversion hidden folders.

grep mit colour option:

grep -ri –color=auto “needle” .

only tex files:

grep -ir –color=auto –include=’*.tex’ ‘needle’ .

Searching in the BAWE corpus for reflection:

grep -ri –color=auto ‘<note.*reflection.*</note>’ .

sed -i needs on mac osx an “extension” for the backup, otherwise sed will choke.

How to diff a folder tree: (from http://hints.macworld.com/article.php?story=20070408062023352)

diff -rq directory1 directory2

diff -qr dirA dirB | grep -v -e ‘DS_Store’ -e ‘Thumbs’ |
sort > diffs.txt

convert all pdfs to text in a directory with tika

#!/bin/bash
FILES=/path/to/*.pdf
for f in $FILES
do
  echo "Processing $f file..."
  # take action on each file. $f store current file name
 tika.jar....
done

Recursive find and replace:

find . -type f -print0 | xargs -0 sed -i ‘s/\/Users/\/Documents/\/home\/atom\/Documents/g’

Testing of find and replace pattern:

echo ‘/Users/Documents/thesis’ | sed ‘s/\/Users/\/Documents/\/home\/Documents/g’

Encrypting pdf documents:

yum install http://tree.repoforge.org/redhat/el6/en/x86_64/rpmforge/RPMS/pdftk-1.44-2.el6.rf.x86_64.rpm

pdftk notencrypted.pdf output encripted.pdf userpw secretpassword

Copy all files from a server, while preserving timestamp:

rsync -chavzP –stats  root@xxx.xxx.x.x:/home/tom/Documents/ .

Merging pdf files with gs

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf *

Copy from terminal a file into the clipboard (first install xclip)

cat big.json | xclip -selection clipboard

Replacing cells containing text within Latex tables (TeXstudio):

Replace cells in latex with a checkmark

replaces & text,text,text & with
& \ding{51}
Search pattern: &([^&]*)

Replacement: & \ding{51}