pkb contents > python | just under 4704 words | updated 10/24/2017

1. Environment
- 1.1. Command line
  - 1.1.1. Launch Python from Bash
  - 1.1.2. IPython
  - 1.1.3. Run Python script from Bash
  - 1.1.4. Take arguments from command line
- 1.2. Jupyter Notebook
  - 1.2.1. Working with Bash
  - 1.2.2. Magics
  - 1.2.3. Rich display
  - 1.2.4. IPython widgets
- 1.3. Managing modules
- 1.4. Which modules?
2. Language
- 2.1. Fundamental characteristics
- 2.2. Operators
- 2.3. Control flow statements
- 2.4. Comprehensions
- 2.5. Generator expressions
- 2.6. Datatypes
  - 2.6.1. Booleans
  - 2.6.2. Numerics
  - 2.6.3. Sequences
    - 2.6.3.1. Strings, bytes, & unicode
    - 2.6.3.2. Lists
    - 2.6.3.3. Queues
    - 2.6.3.4. Tuples
  - 2.6.4. Sets
  - 2.6.5. Dictionaries
  - 2.6.6. Datetimes
- 2.7. Functions
- 2.8. Closures
- 2.9. Decorators
- 2.10. Style
  - 2.10.1. Spacing
  - 2.10.2. Naming
  - 2.10.3. Structure
  - 2.10.4. Namespace & docstrings
3. Paradigms
- 3.1. Object-oriented Python
  - 3.1.1. Methods
- 3.2. Functional Python
  - 3.2.1. Lambdas
  - 3.2.2. Currying
  - 3.2.3. Map-reduce, filter, etc.
- 3.3. Test-Driven Development
  - 3.3.1. Logging, errors, & debugging
    - 3.3.1.1. Raising an error
4. Sources
- 4.1. References
- 4.2. Read
- 4.3. Unread

1. Environment

1.1. Command line

1.1.1. Launch Python from Bash

python --version
python  # launches some version of python 2
python3  # launches some version of python 3
quit()

python fname.py  # run a script

1.1.2. IPython

IPython is a special shell that adds some functionality versus the normal Python shell:

Input/output history
Reverse search with ctrl-r
Tab completion

$ ipython
%history
In
Out

1.1.3. Run Python script from Bash

# this code will run only if the script is executed from the command line
# it won't run if the script is imported by another script
if __name__ == '__main__':
    ...

1.1.4. Take arguments from command line

import sys
script_name = sys.argv[0]
for arg in sys.argv[1:]:
    ...

1.2. Jupyter Notebook

Jupyter Notebook has two modes: command mode, where you're manipulating cells (access via esc key) and edit mode, where you're working inside them (access via 'enter' key).

Shortcuts (view all with esc + h keys:

dd deletes a cell

jupyter notebook  # launches JN in browser
# quit JN by typing ctrl+c twice in the command line
# share JN by uploading it to GitHub --> http://nbviewer.jupyter.org/

<object>? # view help
<object.*ing> # wildcard match

<object>?? # view source code

1.2.1. Working with Bash

!<shell command> # calling a Bash command from Jupyter Notebook
x = !cat fname.txt # saving results of Bash command to a Python variable
!cat {my_fname} # passing contents of Python variable to a Bash command

1.2.2. Magics

Help: %magic
View all: %lsmagic
Line magics: %<magic>
Cell magics: %%<magic>

# profile code
%%timeit
<code>

%timeit <code>

%%file fname.ext # create a file
% run fname.py # run a script

%matplotlib # graphing options

1.2.3. Rich display

Jake VanderPlas' code:

# create a class which defines the _repr_html_ method, returning a string of HTML
class RedText(object):
    def __init__(self, text):
        self.text = text

    def _repr_html_(self):
        return "<font color='red' size=24>" + str(self.text) + "</font>"

RedText('hello there')

# example 2
class ListDisplay(object):
    def __init__(self, L):
        self.L = L

    def _repr_html_(self):
        output = '<ul>'
        for value in self.L:
            output += "<li>" + str(value) + "</li>"
        output += "</ul>"
        return output

my_list = [1, 2, 3]
ListDisplay(my_list)

1.2.4. IPython widgets

Jake VanderPlas: "transform simple Python functions into interactive widgets"

# Initial install
!conda install ipywidgets
!jupyter nbextension enable --py widgetsnbextension

from ipywidgets import interact

1.3. Managing modules

Libraries/packages are directories of Python scripts/modules ; each script contains special functions, methods, and/or types.

python3 get-pip.py
pip3 install module_name

import module_name  # use functions from module as module_name.function_name()
import module_name as nickname  # use functions as nickname.function_name()

from module_name import function_name  # partial import; use function as function_name()
from module_name import *  # full import; bad practice because:
# (1) it floods __name__, the local namespace;
# (2) names you’ve defined locally or have previously imported may be overwritten;
# (3) the module's contents are no longer contained in the module's namespace

dir(module_name)  # view contents of module

1.4. Which modules?

See also: Doug Hellmann's Python Module of the Week , SciPy's directory of science-related Python resources and modules , Fredrik Lundh's tour of the Python standard library modules [pdf], the Python Module Index , and libraries included in the ActivePython and Anaconda Python distributions:

For data wrangling - collections (defaultdict), pandas (dataframes), numpy (arrays), GraphLab Create
For datetimes - datetime, pytz
For web scraping & parsing - urllib2, requests, scrapy, beautifulsoup, robobrowser
For I/O - csv, json, lxml
For data analysis - math, statistics, random, numpy
For data visualization - matplotlib, seaborn, prettytable, tablib, bokeh (interactives)
For scientific computing - scipy (integrals, diffeqs, matrixes)
For machine learning - scikit-learn, GraphLab Create
For text analysis - nltk, re, string
For functional Python - operator, functools
For testing - nose, logging, coverage, unittest, exceptions, pdb

2. Language

2.1. Fundamental characteristics

Python prioritizes readability and simplicity; import this
Python is extremely picky about indentation, e.g. to define functions
Python is case sensitive and has several reserved words
Python uses zero-based indexing

2.2. Operators

ASSIGNMENT operators
a = 1
b = 2; c = 3
d = a == b # returns False

# COMPARISON operators
a == b  # checking equality/equivalency; returns True
a != b  # returns False
a is b  # checking identicality; returns False
a is not b  # returns True
a <= b  # inequality
a < b  # strict inequality

# LOGIC operators are used for selection and filtering;
# also with conditional operators to control program flow, although
# multiline if/else expressions are often more readable than complex Booleans
if x > y or y != 1:
    print(x)

between1_and5 = [i for i in my_list where i > 1 and i < 5] # a list comprehension

# avoid negation of positive expressions, e.g. if not a is b
# prefer inline negation:
if x is not y:
    print(x)

all(my_iterable)  # returns True if my_iterable is empty, or all its elements are True
any(my_iterable)  # returns False if my_iterable is empty, or any element is False

2.3. Control flow statements

# CONDITIONAL operators:
if x > 2:
    continue  # jumps to next iteration
elif x < 0:
    if x == -1:  # conditionals can be nested
        break  # completely exits loop
else:
    # if included, else clause must be at the end  elif, else

# there is a conditional execution structure for errors;
# this is called catching an exception:
try: # to run code based on input
except: # ask for better input
else: # execute if try was successful; visually distinguishes the success case
finally: # run if all prior code has failed, e.g. close file handles

# DEFINITE loop:
for i in [set]: …
# or range(x) in python3 // xrange(x) in python2
# or range(len(my_list) in python3 // range(x) in python2
# or i, em in enumerate(my_iterable, [starting_index])

# INDEFINITE loop:
while [condition]: …

2.4. Comprehensions

Often, for loops can be conveniently replaced with a comprehension. Comprehensions can be fairly complex, but at a certain point it's better to switch back to a loop.

squares = [i**2 for i in range(10)]  # list comprehension
squares3 = [i**2 for i in range(30) if i%3==0]   # conditional list comprehension
two_filter = [x for x in a if x>4 if x%2==0]  # multiconditional list comprehension
squared = [[x**2 for x in row] for row in matrix]  # nested list comprehensions

grid_list = [(x,y) for x in rows, y in cols]   # list comprehension returns list of tuples
set = {num * 2 for num in [5, 2, 18, 2, 42, 2]}  # set comprehension

# dictionary comprehensions
squares_dict = { i : i**2 for i in range(20)}
transposed_dict = {dict[key]:key for key in dict}
dict = {letter: num for letter, num in zip('abcdef', range(1, 7))}

2.5. Generator expressions

A generator expression , also called a naked comprehension, is useful for processing large datasets because intermediate results are not stored, so RAM isn't overwhelmed.

Generators are "stateful"
You won't get any errors when you iterate over an already exhausted iterator; see pp. 40-41 of Effective Python .
Generators are great for functional programming; they execute very quickly when chained together

it = (len(x) for x in open('myfile.txt'))
print(next(it))
print(next(it))

roots = ((x, x**0.5) for x in it)
print(next(roots))

sum(i**2 for i in range(10))

list(my_generator(data))  # to convert generator to list, but why??

2.6. Datatypes

Overview of standard types :

Numerics are integers, floats, complex(re,im), decimals
Sequences are strings, lists, queues, tuples, ranges
Strings, bytes, unicode are character types
Collections AKA containers are lists, queues, tuples, sets, dicts;
Collections support operators in, not in
Not all containers are iterable, see iterables vs iterators vs generators
The only container also a mapping is dict

print(type(my_var))  # check type
print(repr(my_var))  # printable representation; differentiates '5' and 5 when printing

str()  # convert to string
text.decode('utf-8')  # convert bytes to unicode
text.encode('utf-8')  # convert unicode to bytes

int()  # convert numeric to integer
my_int.to_bytes(length, byteorder=big, *, signed=False)  # OverflowError if too small
my_int.from_bytes(bytes, byteorder=little, *, signed=True)
my_integer.bit_length()  # how many bits to represent an integer?

# also bool(), float()

2.6.1. Booleans

In addition to Boolean operands True and False , all Python objects have truth values
None , 0 for any numeric type, and empty collections evaluate as False

2.6.2. Numerics

my_float = 5.519
abs(my_float)
sum(my_iterable)  # sum up numerics stored in an iterable
round(my_float[, n])  # round float to n digits; n defaults to 0

# standard numeric operators;
# Python converts types as necessary to perform operations:
2 + 3  # addition
2 - 3  # subtraction
2 * 3  # multiplication
2 ** 3  # exponentiation
6 / 3  # division
7 % 2  # modulo; returns remainder of division, e.g. 7/2 = 2*3 + 1, the remainder is 1
5 // 2  # floor division, AKA integer division; divides int by int, drops remainder; e.g. 5//2 = 4
divmod(5, 2)  # returns (x//y, x%y)

import math
math.factorial(my_integer)
math.sqrt(my_float)
math.pi  # a constant
math.e  # a constant
math.gcd(my_float1, my_float2)  # greatest common divisor
math.trunc(my_float)  # truncates float to integer part, without rounding
math.floor(my_float)  # greatest float(integer) less than or equal to x
math.ceil(my_float)  # smallest float(integer) greater than or equal to x
math.log(my_float[, base])

# the decimal library is useful for currency:
import decimal
my_decimal_price = Decimal('5.003')
my_decimal_price.quantize(Decimal('0.01'), rounding=ROUND_UP) # returns 5.01

2.6.3. Sequences

# Operations supported for all sequences:
x not in s  # membership check; returns True or False
x in s

s * n  # adds s to self n times, n is an integer
s + t  # concatenation
len(s)
min(s)
max(s)

# Addressing for all sequences:
s.index(x[, i[, j]])  # find index of element x between optional i (inclusive) to j (exclusive)
s[i:j:k]  # slice s, taking every kth item from index i (inclusive) to j (exclusive)
# More slicing syntax: s[:], s[i:], s[:j], s[-3:-1]
# Stride k can be negative, but keep it positive to avoid confusion
# For readability, consider two statements: one to stride, the next to slice

2.6.3.1. Strings, bytes, & unicode

Like lists, strings are composed of elements that can be accessed via their index. Unlike lists, strings are immutable: individual elements cannot be deleted or modified.

Your program should use unicode at its core, with helper functions to convert input and output:
Convert bytes (or string of bytes): text.decode('utf-8')
Convert unicode string: text.encode('utf-8')
In Python 3, str() is unicode, bytes() is raw 8-bit. In Python 2, str() is raw 8-bit, unicode() is unicode
String output is formatted with with .format() and its mini-language , since % formatting is depreciated

my_string1 = 'allows embedded "double" quotes'
my_string2 = "allows embedded 'single' quotes"
my_string3 = 'quotes can be \'escaped\' using the backslash character'

print("Hello " + user_name + ", how are you doing?") # string concatenation

# split string
my_str.partition(sep)  # returns 3-tuple: (str_before_separator, separator, str_after_separator)
my_str.split(sep=None, maxsplit=-1)  # split string every time delimiter occurs or #maxsplit
my_str.splitlines([keepends])  # keepends is a Boolean

' '.join(my_iterable)  # join elements in an iterator using ' ' as the separator between elements
my_string.replace(old, new[, count])  # optional 'count' specifies #instances to replace
my_string.isalpha()  # False if nonalphabetic character in string
my_string.zfill(width)  # left-pads a string with zeros
my_string.ljust(width[, fillchar])
# Many of these methods have counterparts that start from the end of the string
# s.rindex(), s.rfind(), s.rpartition(), etc.

Lists store multiple elements of any type, including mixed type and including other lists. Lists are mutable; unlike string methods, most list methods alter the list in-place and return None. Lists are both sequences and containers.

my_list = list()
my_list = []
my_list = list('abc')
my_list = ['a', 'b', 'c']
my_list = [i for i in range(len(n))]

b = a  # an ALIAS, not a copy! a is b; changes to b affect a
b = list(a) # copies list; b is equivalent, but not identical to a
b = a[:]  # copies list; b is equivalent, but not identical to a

em in my_list  # check membership

my_list[i[:j]] = em  # update list; em will replace slice i:j, even if len(em) < len(list[i:j])
my_list.insert(index, em)  # adds element at index
my_list.append(my_list2)  # adds element/s at end of list
my_list1 + my_list2  # adds element/s at end of list
my_list.extend(em)  # adds element/s at end of list; faster for large lists

my_list.remove(em)
del my_list[i:j]
my_list.pop([i])  # deletes and returns last element, or ith element

' '.join(my_iterable)  # join elements in an iterator using ' ' as the separator between elements
my_list.reverse()  # reverses list elements in-place
sum(my_list)  # if list elements are numerics
my_list.sort()  # sorts list elements in place
my_sorted_list = sorted(my_list, reverse=False)  # returns sorted copy of unaltered list
# https://wiki.python.org/moin/HowTo/Sorting

2.6.3.3. Queues

Use a double-ended queue , a list-like datatype, when you need to quickly insert or remove items from the end and beginning (deques are a stack-queue hybrid):

import collections
my_deque = deque()

my_deque.appendleft(em)  # add element to left
my_deque.insert(i, em)  # add element at specified index
my_deque.append(em)  # add element to right
my_deque.reverse()  # reverses elements in place, returning None

my_deque.popleft()  # remove and return element from left
my_deque.pop()  # remove and return element from right

Use a heap queue when you want a list that's automatically sorted:

import heapq
my_heap = list()
heappush(my_heap, 3)  # add element
heappush(my_heap, 5)
heappush(my_heap, 1)

my_list = [3, 5, 1]
my_heap = heapify(my_list)

my_heap[0]  # always returns lowest number; here, 1
print(heappop(my_heap), heappop(my_heap))  # removes and prints lowest, next lowest, etc.; here 1, 3

2.6.3.4. Tuples

Tuples addressing works like list addressing; unlike lists, though, tuples are immutable. When comparing tuples, Python proceeds on an index-by-index basis. Tuples are used for composite dictionary keys and multivariable assignment:

my_tuple = 'a',
my_tuple = 'a','b','c','d','e'
my_tuple = tuple(my_iterable)

a = 1,2,
b,c = a  # multivariable assignment, aka unpacking a tuple; b=1, c=2
d=b,c  # packing a tuple; d=1,2 and a==d

directory_dict[last,first] = 'phone_number'  # tuple as composite key
for last, first in directory_dict:
        print first, last, directory_dict[last,first]

2.6.4. Sets

The value of sets is access to set operations; by design, seys lack slicing and indexing:

my_set1 = {'a', 'b'}
my_set2 = set(['a','b','c'])

x in my_set1  # True if x an element of set

words_unique = list(set(words))  # find unique values

my_set1.add(elem)
my_set1.remove(elem)  # raises KeyError if elem not in set
my_set1.discard(elem)
my_set1.pop()  # remove and return arbitrary element
my_set1.clear()  # deletes all elements

my_set1.isdisjoint(my_set2)  # True if nonoverlapping sets
my_set1.issubset(my_set2)
my_set1.union(my_set2)  # creates a new set from union of sets
my_set1.intersect(my_set2)  # creates a new set from intersection of sets
my_set1.difference(my_set2)  # creates a new set: set1 - set2
my_set1.symmetric_difference(my_set2)  # creates a new set: (set1-set2)U(set2-set1)
# many of these operations have more mathematical-looking alternative notation:
# https://docs.python.org/3.5/library/stdtypes.html#set-types-set-frozenset

2.6.5. Dictionaries

A dictionary maps keys to values; values are retrieved via their key, doing away with indices. A dictionary is much faster to search than a list, and is often used to count letter or word occurrences in a block of text.

a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
assert a == b == c == d

my_dict1.update(my_dict2)  # concatenate dictionaries
my_dict['key'] = 'value'  # add element
del my_dict['key']  # delete element
my_dict.clear()  # remove all elements
if key in my_dict: ...  # test membership

# ways to view or unpack a dictionary
for pairs in my_dict.items(): ...
for k, v in my_dict.items(): ...
for k in my_dict.keys(): ...
for v in my_dict.values(): ...
print("{}: {}".format(**my_dict))
assert iter(my_dict) == iter(my_dict.keys())  # returns an iterator of dictionary keys

my_dict.pop('k'[, default_value])  # returns and deletes a random element, or returns default_value
my_dict.popitem()  # deletes and returns arbitrary (k, v) pair
my_dict.setdefault('k'[, default_value])  # return v if k exists, otherwise set k=default_value, returns v
my_dict.get('k'[, default_value])  # returns default_value if k not found; otherwise returns v
my_dict['key'] = my_dict.get('key',0) + 1  # counter

# special dictionaries
import collections
my_ordered_dict = OrderedDict()  # recalls order in which its populated
my_default_dict = defaultdict(int)  # sets default_value == 0, ready to increment
my_default_dict['key'] += 1  # increment values initialized at 0

2.6.6. Datetimes

Code should convert local datetimes to UTC, perform computations, then convert back to local datetimes for display purposes.

import datetime
import pytz  # a database of timezones
# http://www.saltycrane.com/blog/2009/05/converting-time-zones-datetime-objects-python/

# get current date/time
my_current_datetime = datetime.datetime.now(tzinfo=my_timezone)
my_current_date =  now.date()
my_current_time = now.time()

# create naive datetime object: doesn't know its timezone
naive_datetime_from_timestamp = datetime.datetime.fromtimestamp(my_posix_timestamp)
naive_datetime = datetime.datetime(my_year, my_month, my_day[, my_hour[, my_min[, my_sec[, my_microsec]]]])

# create timedelta objects for timezone assignment/conversion
pacific_tz_offset = datetime.timezone(datetime.timedelta(hours=-8))
eastern_tz_offset = datetime.timezone(datetime.timedelta(hours=-5))

# create aware datetime object: knows its timezone
my_pacific_datetime = datetime.datetime(my_year, my_month, my_day, my_hour, tzinfo=pacific_tz_offset)
my_pacific_datetime = my_naive_datetime.astimezone(pacific_tz_offset)

# create timedelta object to manipulate datetime objects
my_inc_5hrs = datetime.timedelta(hours=5)

# access or update a datetime object
my_incremented_datetime = my_datetime + my_inc_5hrs
my_datetime.replace(hour=my_new_hour)
my_year = my_datetime.year

# convert datetime from string
my_datetime.strftime(my_format_string)
# Format string mini-language: %Y-%m-%d %H:%M:%S %Z%z
# https://docs.python.org/3/library/datetime.html?highlight=datetime#strftime-and-strptime-behavior

2.7. Functions

Functions are pieces of reusable code that solve particular tasks. Brett Slatkin, Effective Python , p. 10:

As soon as your expression get complicated, it's time to consider splitting them into smaller pieces and moving logic into helper functions. What you gain in readability always outweighs what brevity may have afforded you. Don't let Python's pithy syntax for complex expressions get you into a mess ...

Notation: fname(req_arg[, opt_arg])
Function calls can be nested: print(type(var_name))
With nesting, inner functions have access to the scope of the outer functions
Functions can be recursive (can return a call to themself)
Write functions to raise exceptions ; expect the calling code to handle exceptions

help(fname.mname)
%timeit function(argument)  # in Jupyter Notebook

# more about function arguments:
# http://stackoverflow.com/a/1419160
# http://markmiyashita.com/blog/python-args-and-kwargs/
# http://geekodour.blogspot.com/2015/04/args-and-kwargs-in-python-explained.html
# https://docs.python.org/3.5/tutorial/controlflow.html#more-on-defining-functions
# *args makes a tuple; **kwargs makes a dictionary
def my_func(positional_arg, optional_keyword_arg = default_value, *args, **kwargs):
    ...
    return my_var

my_func(2, optional_keyword_arg = my_value)
# in a function call, keyword arguments must follow positional arguments

# example of exending a function's parameters while remaining
# backwards compatible with existing callers:
def log(message, when=None):
    """ Log a message with a timestamp.
    Args:
        message: Message to print.
        when: datetime when message occured. Defaults to the present time.
    """
    when = datetime.now() if when is None else when  # LOOK HERE
    print('%s: %s' % (when, message))

2.8. Closures

The scope of closures is tricky; see Effective Python , pp. 31-36. The general notion:

def add_to_five(num):
    def inner():  # write a nested function
        print(num+f)

    return inner  # return the nested function

fifteen = add_to_five(10)  # store function call (with argument) as a variable
fifteen()  # call variable as function

2.9. Decorators

Brett Slatkin: Decorators are Python syntax for allowing one function to modify another function at runtime.
Introduction to decorators

import functools

def logme(func):

    import logging
    logging.basicConfig(level=logging.DEBUG)

    @wraps(func) # applies decorator from functools so inner.__name__ = func.__name__, etc.

    def inner(*args, **kwargs):
        logging.debug("Called {} with {} and {}".format(func.__name__, args, kwargs)
        return func(*args, **kwargs)

    return inner

@logme
def say_hello():
    print("Hello there!")

say_hello() # syntactic sugar!!

2.10. Style

See PEPs 20 , 290 , 291 , 345 , but most importantly 8
See Google's style guide
Use pycodestyle , pylint or yapf to automate style

2.10.1. Spacing

Use 2 empty lines between functions
Use 2 empty lines between methods
Indent with 4 spaces (don't tab)
Use spaces after commas: def my_fname(arg1, arg2)
Use spaces around assignments and other operators: my_str = 'value'
Don't use spaces around operators in function calls: my_fname(kwarg=my_var)
Put two spaces between code and inline comments
Avoid single line if, for, while, excepts

2.10.2. Naming

Variable names should be informative nouns; also
Avoid single letter names since they might conflict with pdb (debugging library)
For globals: MODULE_LEVEL_CONSTANT
Function names should be informative verb phrases
For functions and variables, use my_fname (snake case) instead of MyFname (camel case)
For classes and exceptions use camel case
For methods: \_protected_instance_attribute or \_\_private_instance_attribute

2.10.3. Structure

Group common operations into functions; group common functions into classes
Put import statements at top of file with one library per line, in order:
standard library,
3rd party modules,
own modules
79 characters or less per line
When an expression exceeds 79 characters, indent it 4 characters past its normal indentation level on the next line

2.10.4. Namespace & docstrings

Profile before optimizing; use tracemalloc to profile memory use and leaks
PEP 257 : write docstrings for every function, class, and module.
function docstrings should describe the purpose of the function, its arguments (incl. default values, args, kwargs), its return value/s, and any exceptions that callers should handle.
class docstrings should describe the purpose of the class, public attributes and methods, how to interact with protected attributes.
module docstrings should specify the module's purpose and the classes/functions available in the module.

dir()  # all names in current local scope
dir(my_object)  # list of my_object's attributes

help()
help(my_object)

print(repr(f_name.__doc__)  # access docstrings
# repr returns the printable representation of an object
# helpful for debugging, to differentiate between print(5) and print('5')

print(my_object.__dict__)  # to view internals, p. 204

# add docstring to functions, classes, methods:
def my_fcn():
  """this is a docstring"""

  """for a multiline docstring,
  put closing quotes on their own line
  """

import docstrings
help(docstrings.function_name)

3. Paradigms

3.1. Object-oriented Python

Classes are collections of methods and attributes. An object is an instantiation of a class; everything in Python in an object.


class ClassName(ParentName1, ParentName2, arg1, ...):  # define a class, its inheritance & arguments

    def mname(self):  # create method in a class
        vname = my_value  # create attributes in a class
        return self.vname

    def method_override:  # when method name duplicates a parent's method's name
        ...

    def __init__ (self, arg1 = default_val):  # control what happens on instantiation
        ...

    def __str__(self):  # control results of print(my_object)
        return({},{}.format(self.__class__.__name__, self.vname))

from scriptname import ClassName  # use a class
my_var = ClassName.vname  # access attributes in the class
inst_name = ClassName(arg)  # create an instance of a class
inst_name = filename.ClassName()
inst_name.vname = my_value  # define attributes of an instance

3.1.1. Methods

Methods are functions inherited from an object's class
Find methods for datatype: help(type_name)
Methods can be chained: my_str.lower().strip()
CAUTION: some methods change the object they’re called on

3.2. Functional Python

Core concepts of the functional approach to programming, see also [ 1 ], [ 2 ], [ 3 ]:

Computation is the evaluation of functions ; code should be mostly functions
Programming is done with expressions : pass the output of a function to another
either by chaining (possible if each function returns an iterable)
or by using intermediary variables
No side-effects from computation : a function shouldn’t change values that are outside its proper scope (e.g. defining a variable in a function that reaches outside the function: def my_function(): global var_name, nonlocal var_name)
be especially careful about mutable objects like lists & dicts
Functions are first-class citizens , since they can be both inputs to and outputs from other functions
- def my_func(other_func): ...
Functions should be limited in scope and functionality
e.g., not too many arguments
e.g., prefer short functions that do one thing

3.2.1. Lambdas

Anonymous functions that we won’t need to use again; one line long; can’t contain assignments; automatically return the last value calculated.

filter(lambda book: book.pages >= 600, BOOKS)
reduce(lambda x, y: x if len(x) > len(y) else y, [s for s in strings])

3.2.2. Currying

Currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument.

3.2.3. Map-reduce, filter, etc.

# map: transform every element of an iterable
# similar to list comprehension: [my_func(i) for i in my_iterable]
# prefer map to list comprehension when need to nest functions
list(map(my_func, my_iterable))

# reduce: good algorithm for summing numbers, multiplying numbers
from functools import reduce
def product(x,y): return x*y → print(reduce(product([1,2,3,4,5])))

# sorting: operator module is helpful
# access attributes of an object: attrgetter
# access items in a dict: itemgetter
sorted(my_data, key=itemgetter(‘dict_key_name’), reverse=True)
# key here is a **kwarg
# see also: reversed()

# filtering:

# filter function tests every item in iterable, and keeps the truthy ones
# equivalent to: [item for item in iterable if func(item)]
def is_long_book(book):
    return book.pages >= 600
filter(is_long_book, books_data)

#  builds iterator from True elements of my_iterable or my_function(my_iterable)
filter([my_function,] my_iterable)

# builds iterator from my_iterable's False elements
import itertools
itertools.filterfalse(my_iterable)

# partial:
from functools import partial
def markdown(book, discount):
    ...
std_discount = partial(markdown, discount=.2)
print(std_discount(my_data))

3.3. Test-Driven Development

TDD = write tests for code before writing code itself
Doc tests are based on string comparison, so may have issues w/ floats; very code-specfic, not portable
coverage
unittest

# write doc test
def my_function():
  """Explanation of function
  do this code
  expected_result
  """

# run doc tests
python -m doctest filename.py

# testing the extent of testing
pip install coverage
coverage run tests.py
coverage report -m  # in terminal
coverage html  # in browser

# unit tests
python -m unittest tests.py
if __name__ == ‘__main__’:
  unittest.main()

import unittest
class my_unittest(unittest.TestCase):
  def test_addition(self):
    assert 4 + 5 == 9

# quantitative assertions:
self.assertEqual(x, y)
self.assertNotEqual(x, y)
self.assertGreater(x, y)  # x > y
self.assertLess(x, y)
self.assertGreaterEqual(x, y)
self.assertLessEqual(x,y)

# logical assertions:
self.assertTrue()
self.assertFalse(my_function(test_value))

# membership assertions:
self.assertIn(x, y)  # x in y?
self.assertNotIn(x,y)
self.assertIsInstance(x, y)

# exception assertions:
with assertRaise(x): // code to test

3.3.1. Logging, errors, & debugging

import logging
logging.basicConfig(filename=’fname.log’, level=logging.DEBUG)
log levels: critical, error, warning, info, debug, notset
logging.info(‘string to log’)
# https://docs.python.org/3/howto/logging.html

import pdb
pdb.set_trace() # launches a psuedo-shell
`help`
# type ‘n’ to run the next line of code
# type ‘c’ to run as normal
`exit()`

3.3.1.1. Raising an error

Joseph Hellerstein's code:

import pandas as pd
def func(df):
    """"
    :param pd.DataFrame df: should have a column named "hours"
    """
    if not "hours" in df.columns:
        raise ValueError("DataFrame should have a column named 'hours'.")

4. Sources

Generator comprehensions
DataCamp - Intermediate Python for data science
DataCamp - 18 most common questions about Python lists
The Hitchhiker's Guide to Python
Google's Python class, with exercises
Full stack Python tutorial
Lynda - Introduction to data analysis using Numpy
Udacity - Object-oriented programming with Python
Computational statistics in Python
Python's magic methods
Applying operations over pandas dataframes
High Performance Python
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Doing Math With Python
IPython Interactive Computing and Visualization Cookbook
Data Wrangling with Python
Crazy! visualizes how Python executes your code
donnemartin's IPython notebooks repository
nborwankar's IPython notebooks repository
PythonChallenge.com
Python quiz
Intermediate Python challenges