Some Python tips for experienced developers🐍
This is a collection of tips from a notion document for onboarding new developers at my current work that I started to help experienced developers who may not know the ins and outs of python. It's not exhaustive or supposed to contain unique insights you couldn't find elsewhere, it's just that after seeing many developers leave "Protip: Do (x)" in slack, I decided it'd be a good idea to collect them together. It's also mostly here for my benefit, so I keep a copy of it somewhere for my own reference.
There's quite a lot pythonic python that is non-obvious and just requires you to 'be in the know'. This is a collection of comments seen repeatedly in code review or a given as a helpful tip on slack. Hopefully it will help new comers get up to speed on common mistakes (that we've all made) and avoid them.
Meta
If you write a python PROTIP into slack and you get all the emoji internet points, then everyone agrees with you. Give a victory lap and add it to these docs!
If you're going to add a tip,
- Make it short.
- Write in a friendly style.
- Add code examples.
- Give the underlying reasoning behind the tip.
- Links to fuller blog posts and stackoverflow answers are welcome!
Logging
Don't format the string before sending it to the logger
Prefer
logger.info("Order: %s for User:%s went wrong", order.id, user.id)
over
logger.info("Order: {} for User: {} went wrong".format(order.id, user.id)) logger.info("Order:%s for User: %s went wrong" % (order.id, user.id)) logger.info(f"Order: {order.id} for User:{user.id} went wrong")
The former only formats the string if the logger gets evaluated, the latter evaluates the string formatting before it gets sent to the logger
Formatting python log messages - Reinout van ReesSentry is also able to group messages with the same template together, so the first would result in a single sentry error for all user and order combination.
The last three would have a different sentry entry for each order/user combo.
Use logger.exception
when logging inside exception handlers
If you're handling an exception and want this data to appear in sentry, the logger.error(exc_info=True) and logger.exception() are equivalent.
Use raise from
for reraising exceptions with a different type
raise from
is often useful when you want to capture a third party library exception but raise it as an internal exception.
def some_django_rest_api():
# Our APIs should only raise APIException types.
try:
stripe.Charge.something():
except StripeError as e:
raise PaymentExternalError("blah") from e
This means the original exception detail and traceback will appear in sentry.
Use LogEvery
for repeated log messages
Logging the status syncing of 50k users can result in many repeated messages in logging/datadog. You can use LogEvery
to group the messages together.
total_count = qs.count()
with LogEvery(log_every=10, total_count=total_count) as le:
for each in qs.iterator():
... do something ...
le.tick()
# Or, for the common case:
for each in LogEvery(log_every=10, total_count=total_count).iter(qs):
... do something ...
Structuring Your Code
utils.py and util.py considered evil
If you are going to create a file called utils.py or add any function to an already exiting one: think again
putting things into utils makes it a bag full of everything from simple datetime calculations to sophisticated validators and Class mixins.
utils is a nondescriptive name so nobody including you will know what has been put there in two weeks
there is always a better place with a better name than utils
if something is more generic than the project you are working in/requires sharing - move it to a shared library, but please don't call it utils there 🙃
Some examples from when we have already moved from various utils:
file_size_validator
was moved tohelpers/validators
GatherSuppressedErrorsFromFormMixin
tohelpers/mixins
calculate_days_until
tohelpers/datetime
a few topic related files can reside in their own directory, so we have helpers/forms/ with form related modules
Also related reads:
- Naming Things | DevIQ
- What's the differences between helpers and utils? · Issue #808 · erikras/react-redux-universal-hot-example · GitHub
- design - Is having 'Util' classes a cause for concern? - Software Engineering Stack Exchange
Style
Avoid long calculations or database actions for @property
SomeModel.hello
gives no indication to the reader that hello performs cpu/memory intensive calculations, IO or database access. Most readers would assume that SomeModel.hello
is just a class attribute. It's better to leave it as a method SomeModel.hello()
. Naming is difficult, but SomeModel.calculate_hello()
or SomeModel.fetch_hello()
helps indicate to other developers its doing something else under the hood.
Avoid returning errors, prefer to raise an exception and the callee can handle that instead in a try/catch block.
Best practice in python for return value on error vs. success - Stack Overflowdef i_love_horses(horses):
errors = []
for horsie in horses:
if not is_a_horse(horsie):
errors.append(HorseException("not a horse")
return errors
# callee
errors = i_love_horses(my_horses)
if errors:
# do some error handling
def i_love_horses(horses):
errors = []
for horsie in horses:
if not is_a_horse(horsie):
errors.append(HorseException("not a horse")
if errors:
raise ValidationError("those are not horses", errors=errors)
# callee
try:
i_love_horses(my_horses):
except ValidationError:
log
We've got exceptions in python, so use them to indicate errors, in python use if/else for error handling costs more than try/except see
Using try vs if in python - Stack OverflowTesting
Factoryboy
Use pytest factory fixtures instead of importing factories
from blah.factories import CampaignFactory:
def test_something_bad(): # Boo!
CampaignFactory.create()
def test_something_good(campaign_factory): # Yay!
campaign_factory.create()
Use model fixtures when you just need a default instance
The two below are equivalent (in most cases)
def test_a_thing(campaign_factory):
campaign = campaign_factory.create()
do_something(campaign)
def test_a_thingy(campaign):
do_something(campaign)
The first uses the factory fixture to create a model instance, the second is the model fixture automatically created by pytest-factory boy see (https://pytest-factoryboy.readthedocs.io/en/latest/#model-fixture)
Subfactories allows generation of objects that span lookups
use:
travel_manager = factory.SubFactory(TravelManagerFactory)
over:
travel_manager = factory.LazyFunction(lambda: TravelManagerFactory.create())
which means we can use sub properties:
WaitListFactory.create(travel_manager__name='tomek')
Use SelfAttribute to generate models with the same parent object
class BasketFactory:
user = factory.SubFactory(UserFactory)
class OrderFactory:
user = factory.SubFactory(UserFactory)
basket = factory.SubFactory(BasketFactory)
order = OrderFactory.create()
In this example order.user != order.basket.user
You can make the Order have the same ambassador as the factory by using SelfAttribute
class BasketFactory:
user = factory.SubFactory(UserFactory)
class OrderFactory:
user = factory.SelfAttribute('..basket.user')
basket = factory.SubFactory(BasketFactory)
order = OrderFactory.create()
Strings
Avoid +=
when joining/concatenating strings, use .join
How slow is Python's string concatenation vs. str.join? - Stack Overflow
Use format_map when string formatting with dict data
If your data is in a dictionary, perhaps from a JSON file, then str.format_map() is your new best friend.
Pretty way:
print('{name} works at {status} {company}'.format_map(info))Less pretty:
print(f'{info["name"]} works at {info["status"]} {info["company"]}')
From Raymond Hettinger's tweet
You could also do:
print('{name} works at {status} {company}'.format(**info)
But this generates an extra copy of the dictionary with the **info
, see Built-in Types — Python 3.8.3 documentation
Django
Don't fetch an object if you don't need it
Django creates a field for foreign keys called {your_field}_id
,
which you can often use to avoid fetching the related object.
Model field reference | Django documentation | Django
If you have a null=true
field, you probably want it blank=true
null controls whether the field IS NULL on the database level
blank controls whether the field is required on forms (and hence the admin)
python - differentiate null=True, blank=True in django - Stack Overflow
Avoid NULL
CharFields
Note: for CharFields you don't want null or blank to be True. Having a CharField with null value makes it have two types of empty values because both '' and None is an empty state, but one was filled in and the second wasn't.
From Django docs:
Avoid using null on string-based fields such as CharField and TextField. If a string-based field has null=True, that means it has two possible values for “no data”: NULL, and the empty string. In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL. One exception is when a CharField has both unique=True and blank=True set. In this situation, null=True is required to avoid unique constraint violations when saving multiple objects with blank values.
Model Meta
ordering is not free
Model Meta options | Django documentation | Django
if you add an ordering as a Meta option, it will apply to all queries. Even if you need an ordered queryset for your particular usage, it might not be needed all possible queries. It's better just to order it for specifc case.
Consider django-fsm when adding models status fields
If you need to add a status field to a model, consider using django-fsm (see Order.status for an example)
GitHub - viewflow/django-fsm: Django friendly finite state machine supportPrefer properties for date based status fields
If you’re adding a field like is_paid or is_complete or is_smushed, it’s almost always better to have a datetime field paid_at, completed_at, smushed_at and expose boolean as a model property:
class Order(models.Model):
paid_at = models.DateTimeField(null=True)
@property
def is_paid(self):
return self.paid_at is not None
Always assume new type will be added to a set of choices
If you're adding a status, option, choice or type-like field, it is best to account for the possibility that requirements will change and another option is going to be added later. You can raise a NotImplementedError
to account for any status field
def your_function():
if reservation.status == OPEN:
....
elif reservation.status == COMPLETED:
do_something()
else:
raise NotImplemented()
Often when a new option is added, the developer adding the new field will not be aware of all the areas the set of choices are used. So we need a way for a test to fail so that we catch these problems in development rather than being surprised and react to it in production.
@pytest.mark.parametrize('status', Reservation.SomeBookableStatus.values)
def test_your_function():
...
If you parametrize
all the values in your choices rather than the ones that currently exist, then if a new choice is added, test_your_function
will fail with a NotImplemented
error.
Celery (ugh)
Use per task queues instead of general ones
This means we can easily fire up one of dynos to work through specific queues/tasks if a queue suddenly gets very large. Also in RabbitMQ one queue == one process == one cpu, so using multiple queues lets us make use of multiple CPUs.
Avoid doing requests to multiple external services in one task / queue
If one of the external services becomes unavailable, we want to avoid being blocked by that - try to separate external calls to separate tasks, and each of these tasks to separate queues. In that case we can process at least part of the bigger task.
Use retries, prepare for tasks which may fail
Consider the scenario when task fails - because of some external service being down, or for example db state being different than the one task expects. Use https://docs.celeryproject.org/en/latest/userguide/tasks.html#retrying Retry mechanism with backoff to be sure that the failed task will be processed when external service recovers.
Prepare for complete failure of the task. Log the failure. Will it affect user flow? If yes, prepare a separate mechanism like cron, which will respawn the tasks which failed completely.
Avoid renaming/removing celery tasks
Do NOT rename celery/consumer tasks, it’s a destructive operation.You might risk loosing all pending messages in the queue.
Read #7 for more info here
Common Issues Using Celery (And Other Task Queues) - Adam Johnsonif you have to rename, do it in two steps:
- create proxy task with the same name (which will be calling new tasks with same payload)
- remove proxy task after some time (in next releases)
Requests
Add a default timeout when making requests
requests does not a timeout by default!
Quickstart — Requests 2.24.0 documentation
Define a mapping of constants TIMEOUTS in a configuration file and use those entries for specific timeouts.
Postgres
Avoid CharFields with max_length use CharFieldUnlimited
There's no performance difference in postgres between varchar(n) varchar and text, so prefer the latter two over the former.
Use CharFieldUnlimited when you want single line django admin input widgets and TextField if you want multiline text fields.