pkb contents > python | just under 4704 words | updated 10/24/2017
python --version
python # launches some version of python 2
python3 # launches some version of python 3
python # run a script
IPython is a special shell that adds some functionality versus the normal Python shell:
$ ipython
# this code will run only if the script is executed from the command line
# it won't run if the script is imported by another script
if __name__ == '__main__':
import sys
script_name = sys.argv[0]
for arg in sys.argv[1:]:
Jupyter Notebook has two modes:
command mode,
where you're manipulating cells (access via
key) and
edit mode,
where you're working inside them (access via 'enter' key).
Shortcuts (view all with
deletes a cell
jupyter notebook # launches JN in browser
# quit JN by typing ctrl+c twice in the command line
# share JN by uploading it to GitHub -->
<object>? # view help
<object.*ing> # wildcard match
<object>?? # view source code
!<shell command> # calling a Bash command from Jupyter Notebook
x = !cat fname.txt # saving results of Bash command to a Python variable
!cat {my_fname} # passing contents of Python variable to a Bash command
# profile code
%timeit <code>
%%file fname.ext # create a file
% run # run a script
%matplotlib # graphing options
Jake VanderPlas' code:
# create a class which defines the _repr_html_ method, returning a string of HTML
class RedText(object):
def __init__(self, text):
self.text = text
def _repr_html_(self):
return "<font color='red' size=24>" + str(self.text) + "</font>"
RedText('hello there')
# example 2
class ListDisplay(object):
def __init__(self, L):
self.L = L
def _repr_html_(self):
output = '<ul>'
for value in self.L:
output += "<li>" + str(value) + "</li>"
output += "</ul>"
return output
my_list = [1, 2, 3]
Jake VanderPlas: "transform simple Python functions into interactive widgets"
# Initial install
!conda install ipywidgets
!jupyter nbextension enable --py widgetsnbextension
from ipywidgets import interact
Libraries/packages are directories of Python scripts/modules ; each script contains special functions, methods, and/or types.
pip3 install module_name
import module_name # use functions from module as module_name.function_name()
import module_name as nickname # use functions as nickname.function_name()
from module_name import function_name # partial import; use function as function_name()
from module_name import * # full import; bad practice because:
# (1) it floods __name__, the local namespace;
# (2) names you’ve defined locally or have previously imported may be overwritten;
# (3) the module's contents are no longer contained in the module's namespace
dir(module_name) # view contents of module
See also: Doug Hellmann's Python Module of the Week , SciPy's directory of science-related Python resources and modules , Fredrik Lundh's tour of the Python standard library modules [pdf], the Python Module Index , and libraries included in the ActivePython and Anaconda Python distributions:
collections (defaultdict), pandas (dataframes), numpy (arrays), GraphLab Create
datetime, pytz
urllib2, requests, scrapy, beautifulsoup, robobrowser
csv, json, lxml
math, statistics, random, numpy
matplotlib, seaborn, prettytable, tablib, bokeh (interactives)
scipy (integrals, diffeqs, matrixes)
scikit-learn, GraphLab Create
nltk, re, string
operator, functools
nose, logging, coverage, unittest, exceptions, pdb
import this
ASSIGNMENT operators
a = 1
b = 2; c = 3
d = a == b # returns False
# COMPARISON operators
a == b # checking equality/equivalency; returns True
a != b # returns False
a is b # checking identicality; returns False
a is not b # returns True
a <= b # inequality
a < b # strict inequality
# LOGIC operators are used for selection and filtering;
# also with conditional operators to control program flow, although
# multiline if/else expressions are often more readable than complex Booleans
if x > y or y != 1:
between1_and5 = [i for i in my_list where i > 1 and i < 5] # a list comprehension
# avoid negation of positive expressions, e.g. if not a is b
# prefer inline negation:
if x is not y:
all(my_iterable) # returns True if my_iterable is empty, or all its elements are True
any(my_iterable) # returns False if my_iterable is empty, or any element is False
# CONDITIONAL operators:
if x > 2:
continue # jumps to next iteration
elif x < 0:
if x == -1: # conditionals can be nested
break # completely exits loop
# if included, else clause must be at the end elif, else
# there is a conditional execution structure for errors;
# this is called catching an exception:
try: # to run code based on input
except: # ask for better input
else: # execute if try was successful; visually distinguishes the success case
finally: # run if all prior code has failed, e.g. close file handles
# DEFINITE loop:
for i in [set]: …
# or range(x) in python3 // xrange(x) in python2
# or range(len(my_list) in python3 // range(x) in python2
# or i, em in enumerate(my_iterable, [starting_index])
while [condition]: …
loops can be conveniently replaced with a comprehension. Comprehensions can be fairly complex, but at a certain point it's better to switch back to a loop.
squares = [i**2 for i in range(10)] # list comprehension
squares3 = [i**2 for i in range(30) if i%3==0] # conditional list comprehension
two_filter = [x for x in a if x>4 if x%2==0] # multiconditional list comprehension
squared = [[x**2 for x in row] for row in matrix] # nested list comprehensions
grid_list = [(x,y) for x in rows, y in cols] # list comprehension returns list of tuples
set = {num * 2 for num in [5, 2, 18, 2, 42, 2]} # set comprehension
# dictionary comprehensions
squares_dict = { i : i**2 for i in range(20)}
transposed_dict = {dict[key]:key for key in dict}
dict = {letter: num for letter, num in zip('abcdef', range(1, 7))}
A generator expression , also called a naked comprehension, is useful for processing large datasets because intermediate results are not stored, so RAM isn't overwhelmed.
it = (len(x) for x in open('myfile.txt'))
roots = ((x, x**0.5) for x in it)
sum(i**2 for i in range(10))
list(my_generator(data)) # to convert generator to list, but why??
Overview of standard types :
in, not in
print(type(my_var)) # check type
print(repr(my_var)) # printable representation; differentiates '5' and 5 when printing
str() # convert to string
text.decode('utf-8') # convert bytes to unicode
text.encode('utf-8') # convert unicode to bytes
int() # convert numeric to integer
my_int.to_bytes(length, byteorder=big, *, signed=False) # OverflowError if too small
my_int.from_bytes(bytes, byteorder=little, *, signed=True)
my_integer.bit_length() # how many bits to represent an integer?
# also bool(), float()
, all Python objects have truth values
for any numeric type, and empty collections evaluate as
my_float = 5.519
sum(my_iterable) # sum up numerics stored in an iterable
round(my_float[, n]) # round float to n digits; n defaults to 0
# standard numeric operators;
# Python converts types as necessary to perform operations:
2 + 3 # addition
2 - 3 # subtraction
2 * 3 # multiplication
2 ** 3 # exponentiation
6 / 3 # division
7 % 2 # modulo; returns remainder of division, e.g. 7/2 = 2*3 + 1, the remainder is 1
5 // 2 # floor division, AKA integer division; divides int by int, drops remainder; e.g. 5//2 = 4
divmod(5, 2) # returns (x//y, x%y)
import math
math.pi # a constant
math.e # a constant
math.gcd(my_float1, my_float2) # greatest common divisor
math.trunc(my_float) # truncates float to integer part, without rounding
math.floor(my_float) # greatest float(integer) less than or equal to x
math.ceil(my_float) # smallest float(integer) greater than or equal to x
math.log(my_float[, base])
# the decimal library is useful for currency:
import decimal
my_decimal_price = Decimal('5.003')
my_decimal_price.quantize(Decimal('0.01'), rounding=ROUND_UP) # returns 5.01
# Operations supported for all sequences:
x not in s # membership check; returns True or False
x in s
s * n # adds s to self n times, n is an integer
s + t # concatenation
# Addressing for all sequences:
s.index(x[, i[, j]]) # find index of element x between optional i (inclusive) to j (exclusive)
s[i:j:k] # slice s, taking every kth item from index i (inclusive) to j (exclusive)
# More slicing syntax: s[:], s[i:], s[:j], s[-3:-1]
# Stride k can be negative, but keep it positive to avoid confusion
# For readability, consider two statements: one to stride, the next to slice
Like lists, strings are composed of elements that can be accessed via their index. Unlike lists, strings are immutable: individual elements cannot be deleted or modified.
is unicode,
is raw 8-bit. In Python 2,
is raw 8-bit,
is unicode
my_string1 = 'allows embedded "double" quotes'
my_string2 = "allows embedded 'single' quotes"
my_string3 = 'quotes can be \'escaped\' using the backslash character'
print("Hello " + user_name + ", how are you doing?") # string concatenation
# split string
my_str.partition(sep) # returns 3-tuple: (str_before_separator, separator, str_after_separator)
my_str.split(sep=None, maxsplit=-1) # split string every time delimiter occurs or #maxsplit
my_str.splitlines([keepends]) # keepends is a Boolean
' '.join(my_iterable) # join elements in an iterator using ' ' as the separator between elements
my_string.replace(old, new[, count]) # optional 'count' specifies #instances to replace
my_string.isalpha() # False if nonalphabetic character in string
my_string.zfill(width) # left-pads a string with zeros
my_string.ljust(width[, fillchar])
# Many of these methods have counterparts that start from the end of the string
# s.rindex(), s.rfind(), s.rpartition(), etc.
Lists store multiple elements of any type, including mixed type and including other lists. Lists are mutable; unlike string methods, most list methods alter the list in-place and return None. Lists are both sequences and containers.
my_list = list()
my_list = []
my_list = list('abc')
my_list = ['a', 'b', 'c']
my_list = [i for i in range(len(n))]
b = a # an ALIAS, not a copy! a is b; changes to b affect a
b = list(a) # copies list; b is equivalent, but not identical to a
b = a[:] # copies list; b is equivalent, but not identical to a
em in my_list # check membership
my_list[i[:j]] = em # update list; em will replace slice i:j, even if len(em) < len(list[i:j])
my_list.insert(index, em) # adds element at index
my_list.append(my_list2) # adds element/s at end of list
my_list1 + my_list2 # adds element/s at end of list
my_list.extend(em) # adds element/s at end of list; faster for large lists
del my_list[i:j]
my_list.pop([i]) # deletes and returns last element, or ith element
' '.join(my_iterable) # join elements in an iterator using ' ' as the separator between elements
my_list.reverse() # reverses list elements in-place
sum(my_list) # if list elements are numerics
my_list.sort() # sorts list elements in place
my_sorted_list = sorted(my_list, reverse=False) # returns sorted copy of unaltered list
Use a double-ended queue , a list-like datatype, when you need to quickly insert or remove items from the end and beginning (deques are a stack-queue hybrid):
import collections
my_deque = deque()
my_deque.appendleft(em) # add element to left
my_deque.insert(i, em) # add element at specified index
my_deque.append(em) # add element to right
my_deque.reverse() # reverses elements in place, returning None
my_deque.popleft() # remove and return element from left
my_deque.pop() # remove and return element from right
Use a heap queue when you want a list that's automatically sorted:
import heapq
my_heap = list()
heappush(my_heap, 3) # add element
heappush(my_heap, 5)
heappush(my_heap, 1)
my_list = [3, 5, 1]
my_heap = heapify(my_list)
my_heap[0] # always returns lowest number; here, 1
print(heappop(my_heap), heappop(my_heap)) # removes and prints lowest, next lowest, etc.; here 1, 3
Tuples addressing works like list addressing; unlike lists, though, tuples are immutable. When comparing tuples, Python proceeds on an index-by-index basis. Tuples are used for composite dictionary keys and multivariable assignment:
my_tuple = 'a',
my_tuple = 'a','b','c','d','e'
my_tuple = tuple(my_iterable)
a = 1,2,
b,c = a # multivariable assignment, aka unpacking a tuple; b=1, c=2
d=b,c # packing a tuple; d=1,2 and a==d
directory_dict[last,first] = 'phone_number' # tuple as composite key
for last, first in directory_dict:
print first, last, directory_dict[last,first]
The value of sets is access to set operations; by design, seys lack slicing and indexing:
my_set1 = {'a', 'b'}
my_set2 = set(['a','b','c'])
x in my_set1 # True if x an element of set
words_unique = list(set(words)) # find unique values
my_set1.remove(elem) # raises KeyError if elem not in set
my_set1.pop() # remove and return arbitrary element
my_set1.clear() # deletes all elements
my_set1.isdisjoint(my_set2) # True if nonoverlapping sets
my_set1.union(my_set2) # creates a new set from union of sets
my_set1.intersect(my_set2) # creates a new set from intersection of sets
my_set1.difference(my_set2) # creates a new set: set1 - set2
my_set1.symmetric_difference(my_set2) # creates a new set: (set1-set2)U(set2-set1)
# many of these operations have more mathematical-looking alternative notation:
A dictionary maps keys to values; values are retrieved via their key, doing away with indices. A dictionary is much faster to search than a list, and is often used to count letter or word occurrences in a block of text.
a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
assert a == b == c == d
my_dict1.update(my_dict2) # concatenate dictionaries
my_dict['key'] = 'value' # add element
del my_dict['key'] # delete element
my_dict.clear() # remove all elements
if key in my_dict: ... # test membership
# ways to view or unpack a dictionary
for pairs in my_dict.items(): ...
for k, v in my_dict.items(): ...
for k in my_dict.keys(): ...
for v in my_dict.values(): ...
print("{}: {}".format(**my_dict))
assert iter(my_dict) == iter(my_dict.keys()) # returns an iterator of dictionary keys
my_dict.pop('k'[, default_value]) # returns and deletes a random element, or returns default_value
my_dict.popitem() # deletes and returns arbitrary (k, v) pair
my_dict.setdefault('k'[, default_value]) # return v if k exists, otherwise set k=default_value, returns v
my_dict.get('k'[, default_value]) # returns default_value if k not found; otherwise returns v
my_dict['key'] = my_dict.get('key',0) + 1 # counter
# special dictionaries
import collections
my_ordered_dict = OrderedDict() # recalls order in which its populated
my_default_dict = defaultdict(int) # sets default_value == 0, ready to increment
my_default_dict['key'] += 1 # increment values initialized at 0
Code should convert local datetimes to UTC, perform computations, then convert back to local datetimes for display purposes.
import datetime
import pytz # a database of timezones
# get current date/time
my_current_datetime =
my_current_date =
my_current_time = now.time()
# create naive datetime object: doesn't know its timezone
naive_datetime_from_timestamp = datetime.datetime.fromtimestamp(my_posix_timestamp)
naive_datetime = datetime.datetime(my_year, my_month, my_day[, my_hour[, my_min[, my_sec[, my_microsec]]]])
# create timedelta objects for timezone assignment/conversion
pacific_tz_offset = datetime.timezone(datetime.timedelta(hours=-8))
eastern_tz_offset = datetime.timezone(datetime.timedelta(hours=-5))
# create aware datetime object: knows its timezone
my_pacific_datetime = datetime.datetime(my_year, my_month, my_day, my_hour, tzinfo=pacific_tz_offset)
my_pacific_datetime = my_naive_datetime.astimezone(pacific_tz_offset)
# create timedelta object to manipulate datetime objects
my_inc_5hrs = datetime.timedelta(hours=5)
# access or update a datetime object
my_incremented_datetime = my_datetime + my_inc_5hrs
my_year = my_datetime.year
# convert datetime from string
# Format string mini-language: %Y-%m-%d %H:%M:%S %Z%z
Functions are pieces of reusable code that solve particular tasks. Brett Slatkin, Effective Python , p. 10:
As soon as your expression get complicated, it's time to consider splitting them into smaller pieces and moving logic into helper functions. What you gain in readability always outweighs what brevity may have afforded you. Don't let Python's pithy syntax for complex expressions get you into a mess ...
fname(req_arg[, opt_arg])
%timeit function(argument) # in Jupyter Notebook
# more about function arguments:
# *args makes a tuple; **kwargs makes a dictionary
def my_func(positional_arg, optional_keyword_arg = default_value, *args, **kwargs):
return my_var
my_func(2, optional_keyword_arg = my_value)
# in a function call, keyword arguments must follow positional arguments
# example of exending a function's parameters while remaining
# backwards compatible with existing callers:
def log(message, when=None):
""" Log a message with a timestamp.
message: Message to print.
when: datetime when message occured. Defaults to the present time.
when = if when is None else when # LOOK HERE
print('%s: %s' % (when, message))
The scope of closures is tricky; see Effective Python , pp. 31-36. The general notion:
def add_to_five(num):
def inner(): # write a nested function
return inner # return the nested function
fifteen = add_to_five(10) # store function call (with argument) as a variable
fifteen() # call variable as function
import functools
def logme(func):
import logging
@wraps(func) # applies decorator from functools so inner.__name__ = func.__name__, etc.
def inner(*args, **kwargs):
logging.debug("Called {} with {} and {}".format(func.__name__, args, kwargs)
return func(*args, **kwargs)
return inner
def say_hello():
print("Hello there!")
say_hello() # syntactic sugar!!
def my_fname(arg1, arg2)
my_str = 'value'
(snake case) instead of
(camel case)
to profile memory use and leaks
dir() # all names in current local scope
dir(my_object) # list of my_object's attributes
print(repr(f_name.__doc__) # access docstrings
# repr returns the printable representation of an object
# helpful for debugging, to differentiate between print(5) and print('5')
print(my_object.__dict__) # to view internals, p. 204
# add docstring to functions, classes, methods:
def my_fcn():
"""this is a docstring"""
"""for a multiline docstring,
put closing quotes on their own line
import docstrings
Classes are collections of methods and attributes. An object is an instantiation of a class; everything in Python in an object.
class ClassName(ParentName1, ParentName2, arg1, ...): # define a class, its inheritance & arguments
def mname(self): # create method in a class
vname = my_value # create attributes in a class
return self.vname
def method_override: # when method name duplicates a parent's method's name
def __init__ (self, arg1 = default_val): # control what happens on instantiation
def __str__(self): # control results of print(my_object)
return({},{}.format(self.__class__.__name__, self.vname))
from scriptname import ClassName # use a class
my_var = ClassName.vname # access attributes in the class
inst_name = ClassName(arg) # create an instance of a class
inst_name = filename.ClassName()
inst_name.vname = my_value # define attributes of an instance
Core concepts of the functional approach to programming, see also [ 1 ], [ 2 ], [ 3 ]:
def my_func(other_func): ...
Anonymous functions that we won’t need to use again; one line long; can’t contain assignments; automatically return the last value calculated.
filter(lambda book: book.pages >= 600, BOOKS)
reduce(lambda x, y: x if len(x) > len(y) else y, [s for s in strings])
Currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument.
# map: transform every element of an iterable
# similar to list comprehension: [my_func(i) for i in my_iterable]
# prefer map to list comprehension when need to nest functions
list(map(my_func, my_iterable))
# reduce: good algorithm for summing numbers, multiplying numbers
from functools import reduce
def product(x,y): return x*y → print(reduce(product([1,2,3,4,5])))
# sorting: operator module is helpful
# access attributes of an object: attrgetter
# access items in a dict: itemgetter
sorted(my_data, key=itemgetter(‘dict_key_name’), reverse=True)
# key here is a **kwarg
# see also: reversed()
# filtering:
# filter function tests every item in iterable, and keeps the truthy ones
# equivalent to: [item for item in iterable if func(item)]
def is_long_book(book):
return book.pages >= 600
filter(is_long_book, books_data)
# builds iterator from True elements of my_iterable or my_function(my_iterable)
filter([my_function,] my_iterable)
# builds iterator from my_iterable's False elements
import itertools
# partial:
from functools import partial
def markdown(book, discount):
std_discount = partial(markdown, discount=.2)
# write doc test
def my_function():
"""Explanation of function
do this code
# run doc tests
python -m doctest
# testing the extent of testing
pip install coverage
coverage run
coverage report -m # in terminal
coverage html # in browser
# unit tests
python -m unittest
if __name__ == ‘__main__’:
import unittest
class my_unittest(unittest.TestCase):
def test_addition(self):
assert 4 + 5 == 9
# quantitative assertions:
self.assertEqual(x, y)
self.assertNotEqual(x, y)
self.assertGreater(x, y) # x > y
self.assertLess(x, y)
self.assertGreaterEqual(x, y)
# logical assertions:
# membership assertions:
self.assertIn(x, y) # x in y?
self.assertIsInstance(x, y)
# exception assertions:
with assertRaise(x): // code to test
import logging
logging.basicConfig(filename=’fname.log’, level=logging.DEBUG)
log levels: critical, error, warning, info, debug, notset‘string to log’)
import pdb
pdb.set_trace() # launches a psuedo-shell
# type ‘n’ to run the next line of code
# type ‘c’ to run as normal
Joseph Hellerstein's code:
import pandas as pd
def func(df):
:param pd.DataFrame df: should have a column named "hours"
if not "hours" in df.columns:
raise ValueError("DataFrame should have a column named 'hours'.")