Python Core language

Miscellaneous

is vs ==

PEP 8 Guide:

  • Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators.

    • How many singletons are there? Five: None, True, False, NotImplementedandEllipsis
  • Also, beware of writing "if x" when you really mean "if x is not None”

    • e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!

if var is False vs if not var​:

# 1a - will get printed only if var is False
if var is False:
    print('nice')

# 2 - will get printed in all Falsey conditions like None, [], {}, "", 0, 0.0
if not var:
    print('cool')

null checking a variable

# 1 - true if a is not none
if a is not none:
    pass
# 2 - true if a is truthy
if a:
    pass

*args and **kwargs

*args allows for any number of optional positional arguments, which will be assigned to a tuple named args.

**kwargs allows for any number of optional keyword arguments, which will be in a dict named kwargs.

Packing vs Unpacking

  • In a function definition the ** packs keyword arguments into a dictionary.
  • In a function call the ** unpacks data structure of dictionary into positional or keyword arguments to be received by function definition.

Trick: * unpacking is only available in function calls and assignments.

Tuple Unpacking

Tuple unpacking, also called "Multiple assignment", can be used with any iterable.

x, y = 10, 20
y, x = x, y
(x, y)
point = 10, 20, 30
x, y, z = point
(x, y, z)
for k, v in person_dictionary.items():
    print(k, v)
# ==>
for item in person_dictionary.items():
    k, v = item
    print(k, v)

# zip() example
for color, ratio in zip(colors, ratios):
    print(f"It's {ratio*100}% {color}.")
# In Python 3.0, the * operator was added to the multiple assignment syntax
numbers = [1, 2, 3, 4, 5, 6]
first, *rest = numbers
print(first)
print(rest)

# head, *middle, tail = numbers 
# Shallow Unpacking
color, point = ("red", (1, 2, 3))

# Deep Unpacking
color, (x, y, z) = ("red", (1, 2, 3))


items = [1, 2, 3, 4, 2, 1]
for i, (first, last) in enumerate(zip(items, reversed(items))):
    print(i, first, last)

Module, Script, Packages and import

In python 3, only absolute imports and explicit relative imports are allowed. Explicit relative will work when top level script (script with name == 'main') has package set. package is set when we call the module from outside of package with python -m pkg.module

# examples
# See also: https://needone.app/python-importing-explained/

├── main.py
└── src
    ├── bar
    │   └── bar.py
    ├── foo
    │   └── foo.py
    └── sample.py

# python src/foo/foo.py
# file's directory 'src/foo/' will be added to sys.path
# project root will not be added to sys.path
# Only absolute imports allowed, that too from file's current directory only

# python -m src.foo.foo
# project root will be added to sys.path
# foo.py module will have __package__ set to 'src.foo', similar for all modules in 'src'
# Because of this relative import will work on all modules in 'src'
# relative import will not work for module outside package 'src'(like main.py), for that just use absolute import(as project root is in sys.path)

# python -m main
# same as above just that main.py will still have __package__ = None, because of this relavite import will not work in main.py (or any other root level py files)
# Other subpackage level will have __package__ as expected, thus relative import will work within their top level package

PYTHONPATH

  • PYTHONPATH is used by the python interpreter to determine which modules to load.

    • PYTHONPATH only affects import statement
    • Use sys.path to get current PYTHONPATH
  • PATH is used by the shell to determine which executables to run.

    • use os.environ.get('PATH') to get current PATH

In Python script, how do I set PYTHONPATH?

Just add entries to sys.path. See details.

or

set permanently: export PYTHONPATH="/Users/my_user/code”

Implicit __init__

Starting with Python 3.3, Implicit Namespace Packages were introduced. These allow for the creation of a package without any __init__.py file.

Built-ins and Standard Library

Built-in Functions

  • zip(), map(), filter() returns an iterator.

zip()

x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
list(zipped)
# [(1, 4), (2, 5), (3, 6)]

map(), filter() & reduce()

list(map(lambda x: 'a '+ x, ['cat', 'dog', 'cow']))
# ['a cat', 'a dog', 'a cow']

list(filter(lambda x: 'o' in x, ['cat', 'dog', 'cow']))
# ['dog', 'cow']

from functools import reduce
reduce(lambda acc, x: f'{acc} | {x}', ['cat', 'dog', 'cow'])
# 'cat | dog | cow'
# map with list of iterables
map(lambda a, b: a+b, [1, 2, 3], [5, 15, 25])

sorted()

Python lists have a built-in list.sort() method that modifies the list in-place. There is also a sorted() built-in function that builds a new sorted list from an iterable.

Both list.sort() and sorted() have a key parameter to specify a function to be called on each list element prior to making comparisons. This key is used for sorting purposes.

sorted([5, 2, 3, 1, 4])
# [1, 2, 3, 4, 5]

sorted("This is a test string from Andrew".split(), key=str.lower)
# ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

student_tuples = [
    ('john', 'A', 15),
    ('jane', 'B', 12),
    ('dave', 'B', 10),
]
sorted(student_tuples, key=lambda student: student[2])
# [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

json module

Write data to json file

def write_data_to_json_file(data, file_path=None):
    if not file_path:
        from pathlib import Path
        # file_path = Path(__file__).parent.resolve()/'fb_test_users.json'
        # OR
        # generate dynamic file path
        from inspect import stack
        file_path = Path(__file__).parent.resolve()/'p_{}.json'.format(stack()[1][3])
    with open(file_path, 'w') as f:
        import json
        json.dump(data, f, indent=4)

json.dump() vs json.load()

json.dumps

returns a json encoded string from a python data structure

json.loads

returns a python data structure from a json encoded string

json dump & load

read/write from/to file instead of string

See: https://stackoverflow.com/a/32911369/3565194

JSONDecodeError: Expecting , delimiter

Basically \ is an escape key in json too so a serialized json string may have \ for escaping character so always use raw strings when decoding raw strings.

See: https://stackoverflow.com/questions/9156417/valid-json-giving-jsondecodeerror-expecting-delimiter

# Serialization
with open(file_path, 'w') as f:
    data = {'person_id': 124}
    json.dump(data, f, indent=4)

# Deserialization
with open(file_path, 'r') as f:
    data = json.load(f)
    print(data)
    
# Trick: 
# Serialization --> to String
# Handle TypeError: Object of type datetime is not JSON serializable
json.dumps(my_dictionary, indent=4, sort_keys=True, default=str)

# See: https://stackoverflow.com/a/36142844

itertools module

The Python itertools module is a collection of tools for handling iterators

# permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
# permutations(range(3)) --> 012 021 102 120 201 210
itertools.permutations(iterable, r=None)


# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
itertools.combinations(iterable, r)

# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
itertools.combinations_with_replacement(iterable, r)


itertools.groupby(iterable, key=None)



itertools.product(iter_1, iter_2)

logging module

See: https://realpython.com/python-logging/

# Root Logger
import logging

logging.basicConfig(format='%(name)s - %(levelname)s - %(message)s')
logging.warning('Attention! This is a warning.')
# Simple Custom Logger (Preferred)
import logging

logger = logging.getLogger(__name__)

c_handler = logging.StreamHandler()
c_handler.setLevel(logging.WARNING)
c_handler.setFormatter(logging.Formatter('%(name)s - %(levelname)s - %(message)s'))
logger.addHandler(c_handler)

logging with filters

See: https://stackoverflow.com/questions/879732/logging-with-filters

Context Managers

Tutorial: https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/

Real world Uses: http://arnavk.com/posts/python-context-managers/

from contextlib import contextmanager

@contextmanager
def my_db_manager(db_path):
    # do setup here like getting, opening etc
    db_obj = get_db_obj_from_path(db_path)
    try:
        # this yield object will be assigned to variable after 'as' in 'with' statement
        yield db_obj
    finally:
        db_obj.close()

# usage
with my_db_manager("path/to/db") as db:
    print('doing something with db:', db)
# Above my_db_manager implemented using Custom Context Manager for a db
# NOTE: above my_db_manager is a method, but here it is a class
class my_db_manager:
    def __init__(self, db_path):
        self.db_path = db_path

    def __enter__(self):
        self.db_obj = get_db_obj_from_path(self.db_path)
        # NOTE:
        # if there is any exception raised inside with statement will not be called 
        # thus exit method also will not be called
        
        # this return object will be assigned to variable after 'as' in 'with' statement
        return self.db_obj

    def __exit__(self, *exc):
        self.db_obj.close()

We should use context managers when we see any of the following patterns in our code:

  • Open - Close (see example below)
  • Lock - Release
  • Change - Reset
  • Enter - Exit
  • Start - Stop

See: This Link

ContextDecorator

# It lets you define a context manager using the class-based approach, but inheriting from contextlib.ContextDecorator. By doing so, you can use your context manager with the with statement as normal or as a function decorator. 
from contextlib import ContextDecorator

class makeparagraph(ContextDecorator):
    def __enter__(self):
        print('<p>')
        return self

    def __exit__(self, *exc):
        print('</p>')
        return False

@makeparagraph()
def emit_html():
    print('Here is some non-HTML')

emit_html()

contextlib.nullcontext

See: https://stackoverflow.com/questions/45187286/how-do-i-write-a-null-no-op-contextmanager-in-python

ExitStack

The primary use case for ExitStack is the one given in the class documentation: supporting a variable number of context managers and other cleanup operations in a single with statement.

See: https://docs.python.org/3.6/library/contextlib.html#simplifying-support-for-single-optional-context-managers

String and Things

Byte String vs Unicode String

(TODO: replace with table)

In Python3

unicode string —> str —> ‘hello'

byte string —> byte —> b’hello’

In Python2

unicode string —> unicode —> u’hello’

byte string —> str —> ‘hello’

Miscellaneous

  • The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab)
  • In addition of u, b, r, Python 3.6, introduce f-string for string formatting.

    tmp = 44
    print(f'The temperature is {tmp} Celsius')

Regular Expressions

Regular Expressions

A regex(often called a pattern) is a sequence of characters that define a character pattern.

If a substring can be expressed in terms of the regex, then the substring is said to have matched the regex or the regex is said to have matched the substring. (Also, substring and regex are said to be match of one another.)

Most letters and characters will simply match themselves.

CheatSheet

https://www.debuggex.com/cheatsheet/regex/python

Metacharacters

. ^ $ * + ? \ | ( ) { } [ ]

Metacharacters are not active inside [ ]. So escape them by \ or [ ] e.g. \+ or[+].

Repeating Things

? * + {m,n}

Repetitions such as * and + are greedy; when repeating a RE, the matching engine will try to repeat it as many times as possible. To m

Info and Tips

Regular expressions are compiled into pattern objects

Always use raw strings for RegEx to avoid backslash plague.

match() - Determine if the RE matches at the beginning of the string.

search() - Scan through a string, looking for first location where this RE matches.

findall() - Find all substrings where the RE matches, and returns them as a list.

match() and search() return a match object if the match is found, None otherwise.

match object important methods:

group() —> Return the string matched by the RE

span() —> Return a tuple containing the (start, end) positions of the match

start() —> Return the starting position of the match

end() —> Return the ending position of the match

If A and B are regular expressions, A|B will match any string that matches either A or B. Crow|Servo will match either 'Crow' or 'Servo', not 'Cro', a 'w' or an 'S', and 'ervo'.

\b Word boundary.

p = re.compile('[a-z]+') # p is a pattern object
m = p.search('tempo') # m is a match object

<==>
m = re.search('[a-z]+', 'tempo')

# both type of fns accepts Compilation Flags like DOTALL, IGNORECASE

Grouping

()

You can repeat the contents of a group with a repeating qualifier, such as *, +, ?, or {m,n}. For example, (ab)* will match zero or more repetitions of ab.

To determine the group number, just count the opening parenthesis characters, going from left to right.

# This is same as if all ()s are removed
group() == group(0)

p = re.compile('(a(b)c)d')
m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'

# group() can be passed multiple group numbers at a time
>>> m.group(2,1,2)
('b', 'abc', 'b')

Lookahead Assertions

The lookahead assertions are zero-width assertion, (i.e. They just check if containing regex is there or not)

(?=...)

Positive lookahead assertion. This succeeds if the contained regular expression, represented here by ..., successfully matches at the current location, and fails otherwise. But, once the contained expression has been tried, the matching engine doesn’t advance at all; the rest of the pattern is tried right where the assertion started.

(?!...)

Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression doesn’t match at the current position in the string.

Modifying Strings

split()

Split the string into a list, splitting it wherever the RE matches

sub()

Find all substrings where the RE matches, and replace them with a different string

highlight regex matches


import re
from colorama import Back, Style


def highlight_regex_matches(pattern, text, print_output=True):
	output = text
	len_inc = 0
	for match in pattern.finditer(text):
		start, end = match.start() + len_inc, match.end() + len_inc
		output = output[:start] + Back.YELLOW + Style.BRIGHT + output[start:end] + Style.RESET_ALL + output[end:]
		len_inc = len(output) - len(text)  

	if print_output:
		print(output)
	else:
		return output
Thanks for reading!