Here is a list of commands that I use on a regular basis to explore texts:
grep -Ir –exclude=”*\.svn*” “pattern” *
It searches recursively, ignores binary files, and doesn’t look inside Subversion hidden folders.
grep mit colour option:
grep -ri –color=auto “needle” .
only tex files:
grep -ir –color=auto –include=’*.tex’ ‘needle’ .
Searching in the BAWE corpus for reflection:
grep -ri –color=auto ‘<note.*reflection.*</note>’ .
sed -i needs on mac osx an “extension” for the backup, otherwise sed will choke.
How to diff a folder tree: (from http://hints.macworld.com/article.php?story=20070408062023352)
diff -rq directory1 directory2
diff -qr dirA dirB | grep -v -e ‘DS_Store’ -e ‘Thumbs’ |
sort > diffs.txt
convert all pdfs to text in a directory with tika
#!/bin/bash FILES=/path/to/*.pdf for f in $FILES do echo "Processing $f file..." # take action on each file. $f store current file name tika.jar.... done
Recursive find and replace:
find . -type f -print0 | xargs -0 sed -i ‘s/\/Users/\/Documents/\/home\/atom\/Documents/g’
Testing of find and replace pattern:
echo ‘/Users/Documents/thesis’ | sed ‘s/\/Users/\/Documents/\/home\/Documents/g’
Encrypting pdf documents:
yum install http://tree.repoforge.org/redhat/el6/en/x86_64/rpmforge/RPMS/pdftk-1.44-2.el6.rf.x86_64.rpm
pdftk notencrypted.pdf output encripted.pdf userpw secretpassword
Copy all files from a server, while preserving timestamp:
rsync -chavzP –stats root@xxx.xxx.x.x:/home/tom/Documents/ .
Merging pdf files with gs
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf *
Copy from terminal a file into the clipboard (first install xclip)
cat big.json | xclip -selection clipboard
Replacing cells containing text within Latex tables (TeXstudio):
Replace cells in latex with a checkmark
replaces & text,text,text & with
& \ding{51}
Search pattern: &([^&]*)
Replacement: & \ding{51}