Tuesday, November 07, 2017
Rich Hickey's talk: Effective Programs - 10 Years of Clojure (video, transcript, discussions on hackernews) brings lots of discussions about types
- Eric Normand wrote Clojure vs. The Static Typing World
- Dueling Rhetoric of Clojure and Haskell, discussions on hackernews and proggit (of course clojure and haskell subreddits
- On Types And Intent, discussions onproggit, haskell and clojure
and I think this is also related:
- Alan Kay on Lisp, discussions on hackernews and proggit
couple new tools for clojure:
- Lightmod, an all-in-one tool for full stack Clojure
- and the most exciting one: dundalek/closh: Bash-like shell based on Clojure
and again, Clojure transducers from the ground up: using them in practice.
some good reads from past couple weeks:
- Ten Things I Wish I’d Known Before Using Vagrant
- Why does one NGINX worker take all the load?
- You need more than one AWS account: AWS bastions and assume-role
- Getting the Most out of Sqlite3 with Python
- What every systems programmer should know about lockless concurrency (pdf)
after read Learn to use Awk with hundreds of examples and Why Learn AWK?
I'm reading The AWK Programming Language (download pdf from archive.org) again
start coding in python, because all my teammates are python people
surely emacs has great support on *python, install elpy
, company-jedi
packages
then I can eval the buffer by c-c c-c
, just like cider, (more on elpy)
also support virtualenv
, once you created a virtualenv
folder in your project, use m-x pyvenv-activate
to select it and your repl now works within the env.
a snippet for catch python exceptions with traceback:
import logging
logging.basicConfig(filename='/tmp/out.log', level=logging.ERROR)
try:
run_my_stuff()
except:
logging.exception('Got exception on main handler')
I'm learning to write good python codes as well:
- Code Like a Pythonista: Idiomatic Python
- My Favorite Underused Python Idiom
- Top 10 Python idioms I wish I'd learned earlier
Monday, November 20, 2017
a awk script to extract slow queries:
awk '/Query_time: [2-9]{2,}/{a=1;print "";print NR;print l;print;next}/Query_time/{a=0}/Id:/{l=$0;next}a>0' mysql-slowquery.log
some notes when learning postgresql:
switch to postgres
user:
> sudo -iu postgres
> createuser --interactive
> dropuser jim
> createdb jim
> dropdb jim
include sql file # \i /path/to/file.sql
describe tables
# \d
# \dt
(without seq tables)# \d+ table_name
(likedesc table_name
)
import from csv file:
# copy table_name from '/path/to/csv_file' CSV HEADER;
to use crosstab
, first enable tablefunc
:
# CREATE extension tablefunc;
The SELECT
parameter feed to crosstab
must return 3 columns:
- as identifier
- as category
- value
use the crosstab function with a second parameter, which represents the complete list of categories.
export to csv
COPY (select id,name from tablename) TO 'filepath/aa.csv' DELIMITER ',' CSV HEADER;
reference:
- PostgreSQL: Documentation: 10: F.39. tablefunc
- Pivot Tables in PostgreSQL Using the Crosstab Function
I also start learning jupyter (ipython
, pandas
etc.)
start: jupyter notebook
- edit mode: green border
- command mode: blue border
shortcuts:
Esc
: switch to command mode[Shift+Enter]
: run that cell (and select the cell below it; or, use [Control+Enter] if you want to stay in that cell); this works even if you are in Edit Modex
: delete that cella
: create a new cell aboveb
: create a new cell belowm
: make that cell a markdown celll
: add line numbers (useful when working with others)y
: make that cell a code cell (which is the default for a new cell)[Enter]
: enter Edit Mode (turning the cell green and putting the cursor inside)↑ ↓
: use the up and down arrows to move through the cells[Number]
: type a number (i.e., 1, 2, 3,…) to add a heading with that size (then use Enter to get into Edit Mode)
Cell shortcuts:
Shortcut Action
Shift-Enter run cell
Ctrl-Enter run cell in-place
Alt-Enter run cell, insert below
Ctrl-m x cut cell
Ctrl-m c copy cell
Ctrl-m v paste cell
Ctrl-m d delete cell
Ctrl-m z undo last cell deletion
Ctrl-m - split cell
Ctrl-m a insert cell above
Ctrl-m b insert cell below
Ctrl-m o toggle output
Ctrl-m O toggle output scroll
Ctrl-m l toggle line numbers
Ctrl-m s save notebook
Ctrl-m j move cell down
Ctrl-m k move cell up
Ctrl-m y code cell
Ctrl-m m markdown cell
Ctrl-m t raw cell
Ctrl-m 1-6 heading 1-6 cell
Ctrl-m p select previous
Ctrl-m n select next
Ctrl-m i interrupt kernel
Ctrl-m . restart kernel
Ctrl-m h show keyboard shortcuts
for db password, use dotenv
pip3 install python-dotenv
%load_ext dotenv
%dotenv
pandas
axis:
+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|----axis=1----->
+------------+---------+--------+
| |
| axis=0 |
↓ ↓
Thursday, November 23, 2017
continue playing jupyter and pandas:
jupyter
magics
%who list var names
%whos list var names with data
%who_ls Return a sorted list of all interactive variables
%edit This will open an editor
%hist This presents a list of the last several input command lines.
%rerun Re-run previous input
%save [options] filename n1-n2 n3-n4 ... n5 .. n6 ...
%notebook filename: Export and convert IPython notebooks.
%dhist history of visited directories
use %cd -<n> to go to directory number <n>
access cell history:
_ previous cell output
__ next previous
___ next next previous
_13 output of cell 13
_i13 content of input cell 13
In[13] input of cell 13
Out[13] output of cell 13
In[13:18] range
exec(In[13]) execute cell 13
Once you run
%rehashx
, all of your $PATH has been loaded as IPython aliases
SList
String lists (IPython.utils.text.SList) are handy way to process output from system commands.
They are produced by var = !cmd
syntax.
SList (.p, .n, .l, .s, .grep(), .fields() available
lines = !ls -l
lines.grep('embed',prune=1)
lines.fields(8,0)
(8th and first fields)- `lines.fields(*range(1,8))
lines.grep(r'^\?').fields(1)
The .s
, .n
, .p
properties
- The
.s
property returns one string where lines are separated by single space - The
.n
property return one string where the lines are separated by a newline - If the items in string list are file names,
.p
can be used to get a list of “path” objects for convenient file manipulation.
lines.append('hey')
save session
%save -a [filename] 1 10-16 20
%save -a [filename] ~0
pandas
connect mysql:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mysql://user:pass@host:port/db')
f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
glob files:
import os
from glob import glob
data_dir = '/data'
data_files = [f for f in glob(os.path.join(data_dir, '*.gz'))]
read gzip files, first 2 lines:
import gzip
with gzip.open('/tmp/foo.gz') as f:
head = [next(f) for x in range(2)]
field_names = head[1].decode('ascii')
combine multiple dataframes:
import pandas as pd
df = pd.concat([pd.read_table(f, compression='gzip', skiprows=2, names=field_names) for f in data_files])
also reading Python for Data Analysis, 2nd Edition
an old article (2015) shows up on hackernews: A CEO's Guide to Emacs
as usual, can find some useful emacs packages in the discussions:
- jacktasia/dumb-jump: an Emacs "jump to definition" package
- abo-abo/avy: Jump to things in Emacs tree-style
I tried dumb-jump, good, it just works.
avy looks as same as ace-jump, which I'm using.
Wednesday, November 29, 2017
some notes on parallel
command:
from https://github.com/LangilleLab/microbiome_helper/wiki/Quick-Introduction-to-GNU-Parallel
- The file name:
{}
- The file name with the extension removed:
{.}
- To indicate that everything that follows should be read in from the command line:
:::
parallel --eta -j 2 --load 80% --noswap 'blastp -db pdb_blast_db_example/pdb_seqres.txt -query {} -out blastp_outfiles/{.}.out -evalue 0.0001 -word_size 7 -outfmt "6 std stitle staxids sscinames" -max_target_seqs 10 -num_threads 1' ::: test_seq*.fas
--eta
: Shows the estimated time remaining to run all jobs.-j 2
(or--jobs 2
): The number of commands to run at the same time--load 80%
: The maximum CPU load at which new jobs will not be started--noswap
: New jobs won't be started if there is both swap-in and swap-out activity
You can put options you want to use every time in a global configuration file here: /etc/parallel/config
above example can use pipe
:
ls test_seq*.fas | parallel --eta -j 2 --load 80% --noswap 'blastp -db pdb_blast_db_example/pdb_seqres.txt -query {} -out blastp_outfiles2/{.}.out -evalue 0.0001 -word_size 7 -outfmt "6 std stitle staxids sscinames" -max_target_seqs 10 -num_threads 1'
or call command in a file:
for f in test_seq*.fas; do
out=${f/.fas/.out};
echo "blastp -db pdb_blast_db_example/pdb_seqres.txt -query $f -out blastp_outfiles3/$out -evalue 0.0001 -word_size 7 -outfmt \"6 std stitle staxids sscinames\" -max_target_seqs 10 -num_threads 1" >> blastp_cmds.txt
done
cat blastp_cmds.txt | parallel --eta -j 2 --load 80% --noswap '{}'
In this case since the whole command is being input we can just refer to the input itself with {}
.
from https://www.biostars.org/p/63816/:
It is often faster to write a command using GNU Parallel than making a for loop:
for i in *gz; do
zcat $i > $(basename $i .gz).unpacked
done
can be written as:
parallel 'zcat {} > {.}.unpacked' ::: *.gz
Blast on multiple machines
cat 1gb.fasta | parallel -S :,server1,server2 --block 100k --recstart '>' --pipe blastp -evalue 0.01 -outfmt 6 -db db.fa -query - > result
Running composed commands
parallel 'read_fasta -i {} | extract_seq -l 5 | write_fasta -o {.}_trim.fna -x' ::: *.fna
TIL, python 3's print()
function can write to file:
with open("test", 'w') as f:
print('No new line,', file=f, end='')
print('Has new line', file=f)
played a little bit with sipwitch
, there're so little information on the web, luckily, I found some detailed setup guide on Raspberry Pi for Secret Agents
I don't remember whether I read this before: Data Science at the Command Line
I'll go over the list again.
Peter Norvig's pytudes: Python programs to practice or demonstrate skills.
TIL github supports .ipynb file, very good for sharing document mixed with codes.
Blog Archive
- Newer Entries
- 2017 December
- 2018 January
- 2018 February
- 2018 March
- 2018 April
- 2018 May
- 2018 June
- 2018 July
- 2018 August
- 2018 September
- 2018 October
- 2018 November
- 2018 December
- 2019 January
- 2019 February
- 2019 March
- 2019 April
- 2019 May
- 2019 July
- 2019 October
- 2019 November
- 2019 December
- 2020 August
- 2020 September
- 2020 October
- 2020 November
- 2020 December
- 2021 January
- 2021 February
- 2021 March
- 2021 April
- 2021 May
- 2021 June
- 2021 August
- 2021 September
- 2021 December
- 2022 March
- 2022 April
- 2022 May
- 2022 June
- 2022 July
- 2022 August
- 2022 September
- 2022 October
- 2022 November
- 2022 December
- 2023 January
- 2023 February
- 2023 March
- 2023 April
- 2023 July
- 2023 August
- 2023 September
- 2023 October
- 2023 November
- 2023 December
- 2024 January
- 2024 February
- 2024 March
- 2024 April
- 2024 May
- 2024 June
- 2024 August
- 2024 September
- 2024 October
- 2024 November
- 2024 December
- Older Entries
- 2017 October
- 2017 September
- 2017 August
- 2017 July
- 2017 June
- 2017 May
- 2017 April
- 2017 March
- 2017 February
- 2017 January
- 2016 December
- 2016 November
- 2016 October
- 2016 September
- 2016 August
- 2016 July
- 2016 June
- 2016 May
- 2016 April
- 2016 March
- 2016 February
- 2016 January
- 2015 December
- 2015 November
- 2015 October
- 2015 September
- 2015 August
- 2015 July
- 2015 June
- 2015 May
- 2015 April
- 2015 March
- 2015 February
- 2015 January
- 2014 December
- 2014 November
- 2014 October
- 2014 September
- 2014 August
- 2014 March
- 2014 February
- 2014 January
- 2013 December
- 2013 October
- 2013 July
- 2013 June
- 2013 May
- 2013 March
- 2013 February
- 2013 January
- 2012 December
- 2012 November
- 2012 October
- 2012 September
- 2012 August