Python Core language
Miscellaneous
is vs ==
PEP 8 Guide:
-
Comparisons to singletons like
None
should always be done with 'is
' or 'is not
', never the equality operators.- How many singletons are there? Five:
None, True, False, NotImplemented
andEllipsis
- How many singletons are there? Five:
-
Also, beware of writing "if x" when you really mean "if x is not None”
- e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!
if var is False vs if not var:
# 1a - will get printed only if var is False
if var is False:
print('nice')
# 2 - will get printed in all Falsey conditions like None, [], {}, "", 0, 0.0
if not var:
print('cool')
null checking a variable
# 1 - true if a is not none
if a is not none:
pass
# 2 - true if a is truthy
if a:
pass
*args and **kwargs
*args
allows for any number of optional positional arguments, which will be assigned to a tuple
named args.
**kwargs
allows for any number of optional keyword arguments, which will be in a dict
named kwargs.
Packing vs Unpacking
- In a function definition the
**
packs keyword arguments into a dictionary. - In a function call the
**
unpacks data structure of dictionary into positional or keyword arguments to be received by function definition.
Trick: * unpacking is only available in function calls and assignments.
Tuple Unpacking
Tuple unpacking, also called "Multiple assignment", can be used with any iterable.
x, y = 10, 20
y, x = x, y
(x, y)
point = 10, 20, 30
x, y, z = point
(x, y, z)
for k, v in person_dictionary.items():
print(k, v)
# ==>
for item in person_dictionary.items():
k, v = item
print(k, v)
# zip() example
for color, ratio in zip(colors, ratios):
print(f"It's {ratio*100}% {color}.")
# In Python 3.0, the * operator was added to the multiple assignment syntax
numbers = [1, 2, 3, 4, 5, 6]
first, *rest = numbers
print(first)
print(rest)
# head, *middle, tail = numbers
# Shallow Unpacking
color, point = ("red", (1, 2, 3))
# Deep Unpacking
color, (x, y, z) = ("red", (1, 2, 3))
items = [1, 2, 3, 4, 2, 1]
for i, (first, last) in enumerate(zip(items, reversed(items))):
print(i, first, last)
Module, Script, Packages and import
In python 3, only absolute imports and explicit relative imports are allowed. Explicit relative will work when top level script (script with name == 'main') has package set. package is set when we call the module from outside of package with python -m pkg.module
# examples
# See also: https://needone.app/python-importing-explained/
├── main.py
└── src
├── bar
│ └── bar.py
├── foo
│ └── foo.py
└── sample.py
# python src/foo/foo.py
# file's directory 'src/foo/' will be added to sys.path
# project root will not be added to sys.path
# Only absolute imports allowed, that too from file's current directory only
# python -m src.foo.foo
# project root will be added to sys.path
# foo.py module will have __package__ set to 'src.foo', similar for all modules in 'src'
# Because of this relative import will work on all modules in 'src'
# relative import will not work for module outside package 'src'(like main.py), for that just use absolute import(as project root is in sys.path)
# python -m main
# same as above just that main.py will still have __package__ = None, because of this relavite import will not work in main.py (or any other root level py files)
# Other subpackage level will have __package__ as expected, thus relative import will work within their top level package
PYTHONPATH
-
PYTHONPATH is used by the python interpreter to determine which modules to load.
- PYTHONPATH only affects
import
statement - Use
sys.path
to get current PYTHONPATH
- PYTHONPATH only affects
-
PATH is used by the shell to determine which executables to run.
- use
os.environ.get('PATH')
to get current PATH
- use
In Python script, how do I set PYTHONPATH?
Just add entries to sys.path
. See details.
or
set permanently: export PYTHONPATH="/Users/my_user/code”
Implicit __init__
Starting with Python 3.3, Implicit Namespace Packages were introduced. These allow for the creation of a package without any __init__.py file.
Built-ins and Standard Library
Built-in Functions
- zip(), map(), filter() returns an iterator.
zip()
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
list(zipped)
# [(1, 4), (2, 5), (3, 6)]
map(), filter() & reduce()
list(map(lambda x: 'a '+ x, ['cat', 'dog', 'cow']))
# ['a cat', 'a dog', 'a cow']
list(filter(lambda x: 'o' in x, ['cat', 'dog', 'cow']))
# ['dog', 'cow']
from functools import reduce
reduce(lambda acc, x: f'{acc} | {x}', ['cat', 'dog', 'cow'])
# 'cat | dog | cow'
# map with list of iterables
map(lambda a, b: a+b, [1, 2, 3], [5, 15, 25])
sorted()
Python lists have a built-in list.sort() method that modifies the list in-place. There is also a sorted() built-in function that builds a new sorted list from an iterable.
Both list.sort() and sorted() have a key parameter to specify a function to be called on each list element prior to making comparisons. This key is used for sorting purposes.
sorted([5, 2, 3, 1, 4])
# [1, 2, 3, 4, 5]
sorted("This is a test string from Andrew".split(), key=str.lower)
# ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
student_tuples = [
('john', 'A', 15),
('jane', 'B', 12),
('dave', 'B', 10),
]
sorted(student_tuples, key=lambda student: student[2])
# [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
json module
Write data to json file
def write_data_to_json_file(data, file_path=None):
if not file_path:
from pathlib import Path
# file_path = Path(__file__).parent.resolve()/'fb_test_users.json'
# OR
# generate dynamic file path
from inspect import stack
file_path = Path(__file__).parent.resolve()/'p_{}.json'.format(stack()[1][3])
with open(file_path, 'w') as f:
import json
json.dump(data, f, indent=4)
json.dump() vs json.load()
json.dumps
returns a json encoded string from a python data structure
json.loads
returns a python data structure from a json encoded string
json dump & load
read/write from/to file instead of string
See: https://stackoverflow.com/a/32911369/3565194
JSONDecodeError: Expecting , delimiter
Basically \ is an escape key in json too so a serialized json string may have \ for escaping character so always use raw strings when decoding raw strings.
See: https://stackoverflow.com/questions/9156417/valid-json-giving-jsondecodeerror-expecting-delimiter
# Serialization
with open(file_path, 'w') as f:
data = {'person_id': 124}
json.dump(data, f, indent=4)
# Deserialization
with open(file_path, 'r') as f:
data = json.load(f)
print(data)
# Trick:
# Serialization --> to String
# Handle TypeError: Object of type datetime is not JSON serializable
json.dumps(my_dictionary, indent=4, sort_keys=True, default=str)
# See: https://stackoverflow.com/a/36142844
itertools module
The Python itertools module is a collection of tools for handling iterators
# permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
# permutations(range(3)) --> 012 021 102 120 201 210
itertools.permutations(iterable, r=None)
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
itertools.combinations(iterable, r)
# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
itertools.combinations_with_replacement(iterable, r)
itertools.groupby(iterable, key=None)
itertools.product(iter_1, iter_2)
logging module
See: https://realpython.com/python-logging/
# Root Logger
import logging
logging.basicConfig(format='%(name)s - %(levelname)s - %(message)s')
logging.warning('Attention! This is a warning.')
# Simple Custom Logger (Preferred)
import logging
logger = logging.getLogger(__name__)
c_handler = logging.StreamHandler()
c_handler.setLevel(logging.WARNING)
c_handler.setFormatter(logging.Formatter('%(name)s - %(levelname)s - %(message)s'))
logger.addHandler(c_handler)
logging with filters
See: https://stackoverflow.com/questions/879732/logging-with-filters
Context Managers
Tutorial: https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/
Real world Uses: http://arnavk.com/posts/python-context-managers/
from contextlib import contextmanager
@contextmanager
def my_db_manager(db_path):
# do setup here like getting, opening etc
db_obj = get_db_obj_from_path(db_path)
try:
# this yield object will be assigned to variable after 'as' in 'with' statement
yield db_obj
finally:
db_obj.close()
# usage
with my_db_manager("path/to/db") as db:
print('doing something with db:', db)
# Above my_db_manager implemented using Custom Context Manager for a db
# NOTE: above my_db_manager is a method, but here it is a class
class my_db_manager:
def __init__(self, db_path):
self.db_path = db_path
def __enter__(self):
self.db_obj = get_db_obj_from_path(self.db_path)
# NOTE:
# if there is any exception raised inside with statement will not be called
# thus exit method also will not be called
# this return object will be assigned to variable after 'as' in 'with' statement
return self.db_obj
def __exit__(self, *exc):
self.db_obj.close()
We should use context managers when we see any of the following patterns in our code:
- Open - Close (see example below)
- Lock - Release
- Change - Reset
- Enter - Exit
- Start - Stop
See: This Link
ContextDecorator
# It lets you define a context manager using the class-based approach, but inheriting from contextlib.ContextDecorator. By doing so, you can use your context manager with the with statement as normal or as a function decorator.
from contextlib import ContextDecorator
class makeparagraph(ContextDecorator):
def __enter__(self):
print('<p>')
return self
def __exit__(self, *exc):
print('</p>')
return False
@makeparagraph()
def emit_html():
print('Here is some non-HTML')
emit_html()
contextlib.nullcontext
See: https://stackoverflow.com/questions/45187286/how-do-i-write-a-null-no-op-contextmanager-in-python
ExitStack
The primary use case for ExitStack is the one given in the class documentation: supporting a variable number of context managers and other cleanup operations in a single with statement.
String and Things
Byte String vs Unicode String
(TODO: replace with table)
In Python3
unicode string —> str —> ‘hello'
byte string —> byte —> b’hello’
In Python2
unicode string —> unicode —> u’hello’
byte string —> str —> ‘hello’
Miscellaneous
- The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab)
-
In addition of u, b, r, Python 3.6, introduce f-string for string formatting.
tmp = 44 print(f'The temperature is {tmp} Celsius')
Regular Expressions
Regular Expressions
A regex(often called a pattern) is a sequence of characters that define a character pattern.
If a substring can be expressed in terms of the regex, then the substring is said to have matched the regex or the regex is said to have matched the substring
. (Also, substring and regex are said to be match of one another.)
Most letters and characters will simply match themselves.
CheatSheet
https://www.debuggex.com/cheatsheet/regex/python
Metacharacters
. ^ $ * + ? \ | ( ) { } [ ]
Metacharacters are not active inside [ ]. So escape them by \ or [ ] e.g. \+
or[+]
.
Repeating Things
? * + {m,n}
Repetitions such as * and + are greedy; when repeating a RE, the matching engine will try to repeat it as many times as possible. To m
Info and Tips
Regular expressions are compiled into pattern objects
Always use raw strings for RegEx to avoid backslash plague.
match() - Determine if the RE matches at the beginning of the string.
search() - Scan through a string, looking for first location where this RE matches.
findall() - Find all substrings where the RE matches, and returns them as a list.
match() and search() return a match object if the match is found, None otherwise.
match object important methods:
group() —> Return the string matched by the RE
span() —> Return a tuple containing the (start, end) positions of the match
start() —> Return the starting position of the match
end() —> Return the ending position of the match
If A and B are regular expressions, A|B will match any string that matches either A or B. Crow|Servo will match either 'Crow' or 'Servo', not 'Cro', a 'w' or an 'S', and 'ervo'.
\b Word boundary.
p = re.compile('[a-z]+') # p is a pattern object
m = p.search('tempo') # m is a match object
<==>
m = re.search('[a-z]+', 'tempo')
# both type of fns accepts Compilation Flags like DOTALL, IGNORECASE
Grouping
()
You can repeat the contents of a group with a repeating qualifier, such as *, +, ?, or {m,n}. For example, (ab)* will match zero or more repetitions of ab.
To determine the group number, just count the opening parenthesis characters, going from left to right.
# This is same as if all ()s are removed
group() == group(0)
p = re.compile('(a(b)c)d')
m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
# group() can be passed multiple group numbers at a time
>>> m.group(2,1,2)
('b', 'abc', 'b')
Lookahead Assertions
The lookahead assertions are zero-width assertion, (i.e. They just check if containing regex is there or not)
(?=...)
Positive lookahead assertion. This succeeds if the contained regular expression, represented here by ..., successfully matches at the current location, and fails otherwise. But, once the contained expression has been tried, the matching engine doesn’t advance at all; the rest of the pattern is tried right where the assertion started.
(?!...)
Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression doesn’t match at the current position in the string.
Modifying Strings
split()
Split the string into a list, splitting it wherever the RE matches
sub()
Find all substrings where the RE matches, and replace them with a different string
highlight regex matches
import re
from colorama import Back, Style
def highlight_regex_matches(pattern, text, print_output=True):
output = text
len_inc = 0
for match in pattern.finditer(text):
start, end = match.start() + len_inc, match.end() + len_inc
output = output[:start] + Back.YELLOW + Style.BRIGHT + output[start:end] + Style.RESET_ALL + output[end:]
len_inc = len(output) - len(text)
if print_output:
print(output)
else:
return output