CategoriesProject ManagementPython

Getting Started with FastAPI-Users and Alembic

(Updated 2022-03-15)

FastAPI-Users is a user registration and authentication system that makes adding user accounts to your FastAPI project easier and secure-by-default. It comes with support for various ORMs, and contains all the models, dependencies, and routes you need for registration, activation, email verification, and more.

When setting up your database, you can use SQLAlchemy (or your preferred ORM), plus the provided models, to create the necessary tables very quickly–as you can see from the example in the docs, it doesn’t take much to get everything you need.

In an actively developed project, though, your database is likely to go through many changes over time. Alembic is a tool, used alongside SQLAlchemy, that helps manage database migrations.

This article will cover how to get started with FastAPI-Users and Alembic in a Poetry project. I’ll be using a SQLite database in the examples, because it’s readily available in Python, it’s a good database, and it will illustrate one of Alembic’s features.

Start by running poetry new or poetry init to start a new project.

Adding Dependencies

First of all, let’s add our dependencies to our Poetry project. For the sake of this tutorial, I’ll be pinning specific version numbers. You should consider what versions you want your project to be compatible with when adding your dependencies.

$ poetry add fastapi==0.74.0
$ poetry add fastapi-users[sqlalchemy2]==9.2.5
$ poetry add databases[sqlite]==0.5.5
$ poetry add alembic==1.7.7

Creating the FastAPI App

(Update: since I first wrote this, FastAPI-Users has made some fairly significant changes that make it much more flexible, but require a bit more setup. I’ve kept it as one file below, but I highly recommend seeing the full example in the docs, where it’s separated into different files)

In your project’s source code directory, create a file main.py and put the following code in it.

from typing import AsyncGenerator, Optional

import databases
from fastapi import Depends, FastAPI, Request
from fastapi_users import models as user_models
from fastapi_users import db as users_db
from fastapi_users import BaseUserManager, FastAPIUsers
from fastapi_users.authentication import (
    AuthenticationBackend,
    CookieTransport,
    JWTStrategy,
)
from fastapi_users.db import SQLAlchemyUserDatabase
import sqlalchemy as sa
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base
from sqlalchemy.orm import sessionmaker

DATABASE_URL = "sqlite+aiosqlite:///./test.db"
SECRET = "SECRET"


class User(user_models.BaseUser):
    name: Optional[str]


class UserCreate(user_models.BaseUserCreate):
    name: str


class UserUpdate(User, user_models.BaseUserUpdate):
    pass


class UserDB(User, user_models.BaseUserDB):
    pass


database = databases.Database(DATABASE_URL)

Base: DeclarativeMeta = declarative_base()


class UserTable(Base, users_db.SQLAlchemyBaseUserTable):
    name = sa.Column(
        sa.String(length=100),
        server_default=sa.sql.expression.literal("No name given"),
        nullable=False,
    )


engine = create_async_engine(DATABASE_URL, connect_args={"check_same_thread": False})

users = UserTable.__table__
user_db = users_db.SQLAlchemyUserDatabase(UserDB, database, users)


def get_jwt_strategy() -> JWTStrategy:
    return JWTStrategy(secret=SECRET, lifetime_seconds=3600)


auth_backend = AuthenticationBackend(
    name="jwt",
    transport=CookieTransport(),
    get_strategy=get_jwt_strategy,
)


class UserManager(BaseUserManager[UserCreate, UserDB]):
    user_db_model = UserDB
    reset_password_token_secret = SECRET
    verification_token_secret = SECRET

    async def on_after_register(self, user: UserDB, request: Optional[Request] = None):
        print(f"User {user.id} has registered.")

    async def on_after_forgot_password(
        self, user: UserDB, token: str, request: Optional[Request] = None
    ):
        print(f"User {user.id} has forgot their password. Reset token: {token}")

    async def on_after_request_verify(
        self, user: UserDB, token: str, request: Optional[Request] = None
    ):
        print(f"Verification requested for user {user.id}. Verification token: {token}")


async_session_maker = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)


async def get_async_session() -> AsyncGenerator[AsyncSession, None]:
    async with async_session_maker() as session:
        yield session


async def get_user_db(session: AsyncSession = Depends(get_async_session)):
    yield SQLAlchemyUserDatabase(UserDB, session, UserTable)


async def get_user_manager(user_db: SQLAlchemyUserDatabase = Depends(get_user_db)):
    yield UserManager(user_db)


app = FastAPI()
fastapi_users = FastAPIUsers(
    get_user_manager,
    [auth_backend],
    User,
    UserCreate,
    UserUpdate,
    UserDB,
)


@app.on_event("startup")
async def startup():
    await database.connect()


@app.on_event("shutdown")
async def shutdown():
    await database.disconnect()


app.include_router(
    fastapi_users.get_auth_router(auth_backend), prefix="/auth/jwt", tags=["auth"]
)
app.include_router(fastapi_users.get_register_router(), prefix="/auth", tags=["auth"])
app.include_router(
    fastapi_users.get_reset_password_router(),
    prefix="/auth",
    tags=["auth"],
)
app.include_router(
    fastapi_users.get_verify_router(),
    prefix="/auth",
    tags=["auth"],
)
app.include_router(fastapi_users.get_users_router(), prefix="/users", tags=["users"])

This is basically just the example given by FastAPI-Users, condensed into one file, and minus a few things, including the code that creates the database table–we’ll be using Alembic to do that. If you’re not already familiar with this, I recommend going through the configuration docs where each section of the code is explained.

You can start up the server using poetry run uvicorn projectname.main:app and make sure everything is working. To test it, I navigated to the docs in my browser (http://127.0.0.1:8000/docs). This should show all the FastAPI-Users routes and how to use them. They won’t work yet, since the database isn’t created yet. So, it’s time to set that up!

Initializing Alembic

In the top level directory of your project, run this command:

$ poetry run alembic init alembic

This will create some directories and files. Note that the name passed to the init command can be whatever you want: maybe you’re going to be managing two different databases, and you want to name each directory after the database that it will apply to.

For a description of the files and directories inside the new alembic directory, take a look at the tutorial.

Now, we need to edit alembic.ini to tell it to use our SQLite database. Find the line that looks like

sqlalchemy.url = driver://user:pass@localhost/dbname

and replace it with

sqlalchemy.url = sqlite:///./test.db

Now Alembic is all set up and ready to go!

Creating a Migration Script

We can use the command alembic revision to have Alembic create our first migration script. By passing the -m flag, the script can be titled.

$ poetry run alembic revision -m "Create FastAPI-Users user table"

This should create a file named something like {identifier}_create_fastapi_users_user_table.py, which contains the beginnings of a migration script in it. All we have to do is write the actual migration.

One of the columns in the user table is a custom type, so at the top of the file add this import: from fastapi_users_db_sqlalchemy import GUID

Now, it’s time to write the actual migration! This is in the form of upgrade() and downgrade() functions. The downgrade() function isn’t required, but if you don’t write it, you won’t be able to go revert a database to a previous version.

def upgrade():
    op.create_table( # This tells Alembic that, when upgrading, a table needs to be created.
        "user", # The name of the table.
        sa.Column("id", GUID, primary_key=True), # The column "id" uses the custom type imported earlier.
        sa.Column(
            "email", sa.String(length=320), unique=True, index=True, nullable=False
        ),
        sa.Column("hashed_password", sa.String(length=72), nullable=False),
        sa.Column("is_active", sa.Boolean, default=True, nullable=False),
        sa.Column("is_superuser", sa.Boolean, default=False, nullable=False),
        sa.Column("is_verified", sa.Boolean, default=False, nullable=False),
    )


def downgrade():
    op.drop_table("user") # If we need to downgrade the database--in this case, that means restoring the database to being empty.

Running the Migration

To update to the most recent revision, we can use this command:

$ poetry run alembic upgrade head

It’s also possible to pass specific versions to upgrade/downgrade to, or use relative identifiers to, for example, upgrade to the version two versions ahead of the current one.

If the command was successful, your test.db database file should now contain a user table with all of the columns specified in the script.

Register an Account

To test that the database is now set up, let’s try creating an account.

First, run poetry install to make sure that your project is installed and ready to go. Then, go ahead and start up uvicorn again. You can use anything you want to send the data to the API, but I think the easiest way to verify everything is working is through the docs.

Go to http://127.0.0.1:8000/docs#/auth/register_register_auth_register_post and click on Try it out. Default values are supplied, but you can edit them if you want. When you’re ready, click Execute and see what your server responds with; if all is well, it should be a 201 response code.

Congratulations, you’ve now used Alembic to migrate your database from nonexistence to having a user table suitable for use with FastAPI-Users! From here, you can continue making revisions based on the needs of your project. It’s totally possible to do that by repeating this same process: create a new migration script using alembic revision, then fill in the upgrade() function, then run the migration. However, there is an alternative that many people find easier. Let’s explore that.

Using Alembic’s Autorevision

Suppose, now that you have the basic columns required by FastAPI-Users, you want to update your user table to also have a name column.

When Alembic ran our first migration, it also created its own table to track the current schema and which version of the database it is. When we make changes to our models, we can tell Alembic to compare the new model to the current database, and automatically create a revision based on those changes.

It’s not perfect, and it isn’t intended to be: in some cases, you’ll still need to edit the output to reflect your intentions.

To get started, we need to edit env.py in the alembic directory. All you have to do is import the Base variable from main.py, which should look like from yourproject.main import Base and then assign it to the target_metadata variable. There’s already a line in env.py that looks like target_metadata = None; just change that to target_metadata = Base.metadata. For more detailed instructions, the tutorial has you covered.

Updating the Models

Now that Alembic can see our model, it’s time to actually change the model. To add a name attribute, there are a few lines of code we need to add to main.py.

The Pydantic models User and UserCreate need to be updated. These are what FastAPI will use in establishing what data the API will need to be sent, for example, when the user is created.

Under User, replace pass with the attribute name: Optional[str].

Under UserCreate, replace pass with the attribute name: str.

Now, users will be required to send a name when they make an account, and can optionally update their name when making a patch request to /users/me.

That takes care of the FastAPI side, but we still need to add the column to the SQLAlchemy model. In the UserTable class, replace pass with

    name = sa.Column(
        sa.String(length=100),
        server_default=sa.sql.expression.literal("No name given"),
        nullable=False,
    )

So, on the database side, the name column is going to be a string, not nullable, and with a default value of “No name given”. Adding new NOT NULL columns to an existing table is a tricky business, and having a default value may not be the right way to do it, but that’s another post for another time.

The Migration Script

To generate the migration script, run this command:

$ poetry run alembic revision --autogenerate -m "Added name column"

The new script should look like this:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.add_column('user', sa.Column('name', sa.String(length=100), server_default=sa.text("'No name given'"), nullable=False))
    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_column('user', 'name')
    # ### end Alembic commands ###

Looks good! After running the migration with poetry run alembic upgrade head, you can run uvicorn again and check to see that the docs are updated and you’re able to add a user with a name successfully.

Batch Operations

(Update: this section was true when I first wrote it. However, as of version 3.35.0, SQLite supports DROP COLUMN. If you run the below downgrade command with the latest versions of SQLite, it will work instead of failing. I’m leaving this section for historic reasons, and because SQLite still doesn’t support other operations, so the information is still useful. You can check the docs for more information about what is supported and why)

Remember how I said that SQLite would help illustrate one of the features of Alembic? Now is that time.

To see why Batch Operations are necessary, try downgrading the latest revision.

$ poetry run alembic downgrade -1

If you got an exception like sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near "DROP": syntax error, then you’ve discovered that SQLite has some limitations when it comes to editing tables in certain ways. One of those limitations is that you can’t drop a column.

Instead, you have to create a new table minus the column you want to drop, copy the data to the new table, drop the old table, and finally rename the new table to the old name.

That sounds like a lot of work, but we can use Alembic’s batch operations to do it for us.

Open up the “Added name column” revision and edit the downgrade() function.

downgrade():
    with op.batch_alter_table("user", schema=None) as batch_op:
        batch_op.drop_column("name")

On Sqlite, this migration will go through the necessary procedure to drop the column. On databases that support dropping columns, it will just drop the column without the extra effort.

There is a lot more to batch operations than that, which you can read all about in the docs, but that covers the basic idea.

If you want, you can have Alembic output autogenerated migration scripts as batch operations by editing env.py and passing the argument render_as_batch=True to context.configure() in the run_migrations_online() function.

Conclusion

In this article, we’ve learned about FastAPI-Users, Alembic, and the basics of how to use Alembic to manage your database. At this point, you should have a functional project with a database ready to accept user registrations. You should have the tools needed to start expanding on that database and upgrade it incrementally as you develop your project.

Finally, here are a few links that I think would be a good place to start learning more.
https://www.chesnok.com/daily/2013/07/02/a-practical-guide-to-using-alembic/
https://speakerdeck.com/selenamarie/alembic-and-sqlalchemy-sane-schema-management?slide=45
https://alembic.sqlalchemy.org/en/latest/cookbook.html#building-an-up-to-date-database-from-scratch
https://www.viget.com/articles/required-fields-should-be-marked-not-null/
https://stackoverflow.com/questions/3492947/insert-a-not-null-column-to-an-existing-table
https://www.red-gate.com/hub/product-learning/sql-prompt/problems-with-adding-not-null-columns-or-making-nullable-columns-not-null-ei028

CategoriesPython

Investigating Python Memory Usage

Alternate title: Why Is This Program Using So Much Memory and What Can I Do About It??

As part of work on my speech transcriber project, which aims to transcribe longer recordings while using less memory by segmenting based on DeepSegment. It’s still very much a work in progress.

While testing on an AWS EC2 instance with 2GB of RAM, though, it crashed with a memory error, even though it shouldn’t use nearly that much. This post is about how I diagnosed and solved the problem, and what tools are available.

Getting an Overview of the Problem

First, I narrowed down my code to something that was more easily repeatable.

from pydub import AudioSegment

segment = AudioSegment.from_file("test_audio.mp3") # Open the 57MB mp3
segment.set_frame_rate(16000) # Change the frame rate

All of the graphs below were based on running this simple test.

Now, it’s time to introduce psrecord, which is capable of measuring the CPU and RAM usage of a process. It can attach to an already-running process with the command psrecord <pid> --plot plot.png, which is useful for peeking at a long-running process.

For our purposes, though, psrecord can start the process for us and monitor it from start to finish. Just put the command to run in quotation marks in place of the pid. It’ll look like psrecord "python test_memory.py" --plot plot.png

Here’s what the resulting graph looks like:

Pydub memory usage before changes

The red line plots CPU usage (on the left) and the blue line memory usage (on the right). The peak memory usage is roughly 2,300MB. Definitely too much for my 2GB EC2 instance.

This is a good overview of the scope of the problem, and gives a baseline of CPU and time to compare to. In other words, if a change gets us below the 2GB mark on RAM, but suddenly takes longer to process, or uses more CPU, that’s something we want to be aware of.

Finding the Root of the Problem

What psrecord does not tell us is where the memory is being allocated in the program. What line(s) of code, specifically, are using up all of this memory?

That’s where Fil comes in. It produces a flamegraph, much like Py-Spy, but with memory usage instead of CPU. This will let us zoom in on the specific lines in pydub that allocate memory.

(Note that Fil’s actual output is an SVG and much easier to use)

According to Fil, the peak memory was 2,147MB and there are a number of places that memory is allocated. Our goal, then, is to look through those places and see if any of them can be removed.

Diving into the Pydub Source

To do that, we’re going to have to dig into the source code and try to understand the flow of data. The following samples come from this file in the pydub repository.

def from_file(cls, file, format=None, codec=None, parameters=None, **kwargs):
... # Open the file and convert it to the WAV format
    p_out = bytearray(p_out) # Cast to bytearray to make it mutable
    fix_wav_headers(p_out) # Mutate the WAV data to fix the headers
    obj = cls._from_safe_wav(BytesIO(p_out)) # Create the AudioSegment
def _from_safe_wav(cls, file):
    file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
    file.seek(0)
    obj = cls(data=file)
    if close_file:
        file.close()
    return obj
def __init__(self, data=None, *args, **kwargs):
...
    else:
        # normal construction
        try:
            data = data if isinstance(data, (basestring, bytes)) else data.read()
...
        wav_data = read_wav_audio(data)
def read_wav_audio(data, headers=None):
... # Read the headers to get various metadata to store in the WavData
    return WavData(audio_format, channels, sample_rate, bits_per_sample,
                   data[pos:pos + data_hdr.size])

When opening a file using AudioSegment.from_file, the flow is basically:

  1. Open the file and convert it to WAV.
  2. Cast the bytes to a bytearray, then mutate that bytearray to fix the wav headers.
  3. Cast the bytearray to BytesIO, then use AudioSegment._from_safe_wav to create the instance of AudioSegment.
  4. _from_safe_wav makes sure the file is opened and at the beginning of the file, before constructing the AudioSegment using the data.
  5. __init__ reads the data from the BytesIO object.
  6. The data is passed to read_wav_audio to get headers extracted, so the raw data being operated on is only the audio data.
  7. read_wav_audio extracts the headers and returns them as part of a WavData object, along with the raw audio data. It cuts off the headers by slicing the bytes.

As Fil showed, there are several copies of the data being passed around. Some can’t really be avoided. For example, slicing bytes is going to make a copy.

The Solution

It took quite a bit of experimenting to arrive at the solution. I started by using a memoryview, which would allow the last step (slicing the data) to not make a copy. That worked for my use, but it broke a number of functions, so it wasn’t acceptable as a contribution.

My next try used a bytearray, which again allowed me to cut off the headers without making a big copy. This got closer (at least, most things didn’t break), but it did break Python 2.7 support. More importantly, it made AudioSegments mutable.

Finally, I realized that I was focusing on the wrong end of the stack. The last operation is naturally what drew my attention first–since it showed up as the cause of the exception when my program ran out of memory. However, there’s a much easier place to reduce copying earlier in the call stack.

Here’s how I changed from_file:

p_out = bytes(p_out)
obj = cls(p_out)

Yes, all that happened is I replaced the casting to BytesIO and the call to _from_safe_wav with casting back to bytes, then instantiating the class directly. If you look back at it, this is exactly what _from_safe_wav did. It just had several layers of indirection: wrapping the bytes in BytesIO, then reading them back later.

So, was that small change worth it? Let’s see what Fil says about it now.

I would say that a ~900MB savings in RAM is worthwhile!

And for completeness, here’s the psrecord graph:

Pydub memory usage after changes

As might be expected, removing things only made it faster. Memory usage peaks lower, and the whole program runs much faster. A lot of the run time seems to have been just copying data around, so that makes sense.

Lessons Learned

First, keep looking until you find the right tools for the job. When I first set out to understand the memory usage, the first tools were designed more for finding memory leaks, which is a different category of memory error. Finding the right tools helped me find the solution much easier.

Second, slow down and think through the options. My initial efforts focused on only one portion of the possible locations that memory usage could be reduced, which ended up being the wrong place to focus.

On the other hand, don’t let analysis paralysis win. Even if it’s not clear where the solution might end up being, jumping in and experimenting can give you a better idea of what might work.

Third, don’t be afraid to explore whether an open source library could be better for your use case! For small files, the overhead of making all those copies is not so significant, so not as many people have likely looked into improving memory usage. Taking the time to explore the issue allowed me to make a contribution.

Thanks for reading! I would appreciate your feedback, or to hear about a tricky memory problem that you debugged.

CategoriesPython

Xonsh Alias: Setup a Python Project

Recently, I was reading The Pragmatic Programmer, and the section on continual learning, especially learning how to use new tools, really stuck with me.

With that in mind, I turned my attention to ways to improve my process for starting up a new project, which is something I do fairly often to experiment.

There are several aspects of setting up a new project, and managing all of them manually can be repetitive, error-prone, and difficult. Some of the ones I wanted to take care of using tools include:

  • Structuring the project directory structure
  • Creating a virtual environment and activating it every time
  • Managing dependencies

The tools I decided to use are poetry and vox/autovox. Poetry takes care of a lot of project management issues, while vox allows me to use virtualenvs that play well with Xonsh. In the future, I’d also like to explore using cookiecutter for templates.

I tied all of these tools–of course, plus git–together into a Xonsh alias. If you’re not familiar with that, check out my introduction to Xonsh and my article about using an alias to filter Mut.py results.

xontrib load vox, autovox

from pathlib import Path


def _create_project(args):
    project_name = args[0]
    poetry config virtualenvs.create false
    poetry new @(project_name)
    cd @(project_name)
    
    env_name = str(Path.cwd())[1:]
    print("Removing previous virtualenv, if it exists (KeyError means that it did not exist)")
    vox deactivate
    vox remove @(env_name)
    
    pyenv_path = Path($(pyenv root).strip()) / "versions"
    interpreter = $(pyenv version-name).split(":")[0]
    interpreter_path = str(pyenv_path / interpreter / "bin" / "python")
    print("Using Python " + interpreter + " at " + str(interpreter_path))
    
    vox new @(env_name) --interpreter @(interpreter_path)
    git init
    rm pyproject.toml
    poetry init

aliases["create_project"] = _create_project


@events.autovox_policy
def auto_based_on_dir(path, **_):
    venv = Path($HOME + "/.virtualenvs" + str(path))
    if venv.exists():
        return venv

The usage is simple: $ create_project project_name will use poetry to create a new project directory project_name, then creates an environment, initializes the git repository, removes the pyproject.toml made by poetry new, and finally runs poetry init to interactively create a new pyproject.toml.

Most of this is pretty simple, but it takes care of several steps at once with one command, which allows me to jump right in to coding when I have a new idea or want to experiment with something.

The most complicated part is the creation of the virtual environment and registering an autovox policy to automatically activate the environment. Vox creates all virtual environments in ~/.virtualenvs. So, for example, if I start a project in /home/harrison/project_name, then a virtual environment gets created at ~/.virtualenvs/home/harrison/project_name.

The auto_based_on_dir function gets registered with autovox and controls activating the proper environment based on what directory I’m working on. It does this by checking whether a virtual environment based on a particular path exists, and returns the path to it if it does.

Conclusion

I’m excited to continue to improve the tools I use in my projects. In particular, poetry seems like a good way to manage and publish projects to the PyPI. It only took a little bit of time to put this together, and I expect it will result in a lot of good.

Switching to vox and using autovox to activate virtual environments should also save a lot of time. In the past, I’ve used pyenv virtualenv and manually activated environments as needed.

What tools do you use as part of your workflow?

CategoriesPython

Advanced Python Data Classes: Custom Tools

Python’s dataclasses module, added in 3.7, is a great way to create classes designed to hold data. Although they don’t do anything that a regular class couldn’t do, they take out a lot of boilerplate code and let you focus on the data.

If you aren’t already familiar with dataclasses, check out the docs. There are also plenty of great tutorials covering their features.

In this tutorial, we’re going to look at a way to write tools that extend dataclasses.

Let’s start with a simple dataclass that holds a UUID, username, and email address of a user.

from dataclasses import dataclass, field
import uuid


@dataclass
class UserData:
    username: str
    email: str
    _id: uuid.UUID = field(default_factory=uuid.uuid4)


if __name__ == "__main__":
    username = input("Enter username: ")
    email = input("Enter your email address: ")

    data = UserData(username, email)
    print(data)

This is pretty simple. Ask the user for a username and an email address, then show them the shiny new data class instance that we made using their information. The class will, by default, generate a unique id for every user.

But what if we have sneaky users who might try giving an invalid email address, just to break things?

It’s simple enough to extend data classes to support field validation. dataclass is just a decorator that takes a class and adds various methods and attributes to it, so let’s make our own decorator that does the same thing.

def validated_dataclass(cls):
    cls.__post_init__ = lambda self: print("Initializing!")
    cls = dataclass(cls)
    return cls

@validated_dataclass
class UserData:
...

Here, we add a simple __post_init__ method to the class, which will be called by the data class every time we instantiate the class. But how can we use this power to validate an email address?

This is where the metadata argument of a field comes in. Basically, it’s a dict that we can set when defining a field in the data class. It’s completely ignored by the regular dataclass implementation, so we can use it to include information about the field for our own purposes.

Here’s how UserData looks after adding a validator for the email field.

from dataclasses import dataclass, field

def validate_email(value):
    if "@" not in value:
        raise ValueError("There must be an '@' in your email!")
    
    return value


@validated_dataclass
class UserData:
    username: str
    email: str = field(metadata={"validator": validate_email})
    _id: uuid.UUID = field(default_factory=uuid.uuid4)

Now the email field of the data class will carry around that validator function, so that anyone can access it. Let’s update the decorator to make use of it.

from dataclasses import dataclass, field, fields

def validated_dataclass(cls):
    cls = dataclass(cls)
    def _set_attribute(self, attr, value):
        for field in fields(self):
            if field.name == attr and "validator" in field.metadata:
                value = field.metadata["validator"](value)
                break

        object.__setattr__(self, attr, value)

    cls.__setattr__ = _set_attribute
    return cls

The new decorator replaces the regular __setattr__ with a function that first looks at the metadata of the fields. If there is a validator function associated with the attribute, it calls the function and uses its return value as the value to set.

The power of this approach is that now anybody can validate fields on their data classes by importing this decorator and defining a validator function in the metadata of their field. It’s a drop-in replacement to extend any data class.

One downside to this is the performance cost. Even attributes that don’t need validation will run through the list of fields every time they’re set. In another article, I’ll look at how much of a cost this actually is, and explore some optimizations we can make to reduce the overhead.

Another downside is the potential lack of readability of setting metadata on every field. If that becomes a problem, you could try defining the metadata dict elsewhere, so the field would look like email: str = field(metadata=email_metadata).

The possible uses of metadata are limitless! Combined with custom decorators that use dataclass behind the scenes, we can add all sorts of functionality to data classes.

For serious validation needs, it’s still most likely to be better to use something like Pydantic or Marshmallow, rather than make your own. Both of them have either built-in support for data classes, or there are other packages available to add that support.

If you have any ideas for extending data classes, let me know in the comments!

CategoriesPython

Learning CPython Bytecode Instructions

Recently, I’ve been interested in learning some of the internal workings of CPython in order to more deeply understand Python and possibly contribute to it. I’ve also come across Nim and wanted to try a project using it. To accomplish two goals at once, I decided to write a toy Python VM using Nim to execute Python bytecode.

This post is where I intend to compile information about Python’s compilation step as I learn about it, as a reminder for myself and a resource for anyone else who might be curious.

What Is Python’s Compilation Step?

For a detailed overview of CPython and the steps a program goes through from source code to execution, check out this RealPython guide by Anthony Shaw. Part 3, in particular, describes the step in question here.

In short, and broadly speaking, the process goes source code -> lexing/tokenizing -> parsing to AST -> compiling to bytecode -> execution. You can compile code using Python’s built-in compile function, which results in a code object.

Here’s the JSON object that results from a code object compiled from the statement hello = 3000. This JSON contains everything needed to run the program. The most important items in this example are "code", which contains the opcodes; "consts", which is a list of constants used in the program; and "names", which is a list of variable names used in the programs. This will make more sense later.

{
  "argcount": 0,
  "cellvars": [],
  "code": "d\u0000Z\u0000d\u0001S\u0000",
  "consts": [3000, null],
  "filename": "test.py",
  "firstlineno": 1,
  "flags": 64,
  "freevars": [],
  "kwonlyargcount": 0,
  "lnotab": "",
  "name": "<module>",
  "names": ["hello"],
  "nlocals": 0,
  "stacksize": 1,
  "varnames": []
}

Another helpful tool is dis, which outputs the opcodes of a compiled program.

For example, the code object, according to dis, looks like this:

  1           0 LOAD_CONST               0 (3000)
              2 STORE_NAME               0 (hello)
              4 LOAD_CONST               1 (None)
              6 RETURN_VALUE

The first column is the line number. The second column I’m not sure about yet; it might be the index of the opcode, since it goes like [opcode, arg, opcode, arg…]. The third column is the name of the opcode. The fourth column is the argument for the opcode, which is usually an index for a different list. Finally, the fifth column is what that index points to.

Putting it all together:

The first opcode, LOAD_CONST, loads a constant from index 0 in the consts list, which is 3000, and puts it on the stack. The next opcode, STORE_NAME, pops a value (3000) off the stack and associates it with the name at index 0 of the names list, which is hello.

The next two opcodes just indicate the end of the frame.

That’s a lot to digest, and we’ve only just managed to assign an integer to a variable!

Strategy

This is a big project, so I’m planning on doing it a bite at a time. At its current stage, my Nim VM can take the bytecode of programs that simply assign constants to variables —and the constant has to be either an integer or a string.

My strategy is to write a Python program that requires just a bit more functionality than what my VM currently implements. Then work on the VM until it can successfully run the program. Rinse and repeat.

I’ll try to write it in the best Nim that I can, but I won’t let analysis paralysis prevent me from going ahead and getting something working. Since I’m doing this to learn Nim, a lot of it probably isn’t going to be the most idiomatic, performant Nim code. That’s okay. I’ll try to fix things as I continue to learn.

When I finish implementing an opcode, or learn something useful about code objects and such like, I’ll update this article with what I’ve learned.

Credits/References

Thanks to Alison Kaptur for an excellent article describing a simple implementation of a Python VM written in Python. That pointed me to the actual VM, Byterun, which she worked on, though it was primarily written by Ned Batchelder. It has been a great help in understanding how all of this works. So, without further ado, here are my notes on terminology, code objects, and opcodes.

Definitions

  • Code Object: A collection of fields containing all the information necessary to execute a Python program.
  • Stack: A data structure holding the values that the VM is working on. You push values to the top of the stack to store them and pop value from the top of the stack to work with them.
  • Opcode: A byte that instructs the VM to take a particular action, such as pushing a value to the stack or popping the top two values from the stack and adding them together.
  • Jump: An opcode that “jumps,” skips to another part of the bytecode. This can be relative (jump forward 10 bytes from the current position, for example) or absolute (jump to the 23rd byte from the beginning).

Code Objects

Code objects consist of several fields containing various information about the program.

Field NameDescription
argcountI’m not sure yet.
cellvarsI’m not sure yet.
codeA bytestring representing the opcodes and their arguments. Every other byte is an opcode, and in between each opcode is a byte for the argument.
constsA list of constants. These could be integers, strings, None [null], etc. These are put onto the stack by the LOAD_CONST opcode.
filenameThe name of the file which was compiled. When compiling from within a program, this is passed to the compile built-in function as an argument, so it could be any string, really.
firstlinenoThe first line number. I’m not sure what this is for, yet. I’m assuming it has something to do with frames.
flagsI’m not sure yet.
freevarsI’m not sure yet.
kwonlyargcountI’m not sure yet. I assume it has something to do with keyword-only arguments to functions.
lnotabI’m not sure yet.
nameThe name of the module. I assume this is related to importing.
namesA list of names of variables, which will be referenced by certain opcodes to associate variables with values.
nlocalsI’m not sure yet.
stacksizeI’m not sure yet. It’s possible that the max size of the stack is precomputed so that the data structure can be initialized to the correct size.
varnamesI’m not sure yet.

Opcodes

Note: sometimes the byte value skips ahead, to leave room for new opcodes to be inserted there in the future. To indicate when that happens, in the table below, the Byte Value is in bold.

Some of the information here comes from the dis module’s documentation. Some of it comes from the opcode module’s source code. Some of it comes from what I’ve learned through experimentation.

NameByte
Value
Description
POP_TOP1Pop the top value from the stack.
ROT_TWO2Rotate the top two values of the stack. For example: [1, 2, 3, 4] -> [1, 2, 4, 3].
ROT_THREE3As above, but rotate the top three values.
DUP_TOP4Duplicate the top stack value.
DUP_TOP_TWO5Duplicate the top two stack values.
ROT_FOUR6As ROT_TWO, but rotate the top four values.
NOP9No operation. Does nothing. Used as a placeholder.
UNARY_POSITIVE10Unary operations take the top of the stack apply an operation to it, then push it back on the stack. This one adds 1 to it.
UNARY_NEGATIVE11Same as above, but subtract 1.
UNARY_NOT12Negates the top stack value (as in not x).
UNARY_INVERT15Inverts the top stack value (as in ~x).
BINARY_MATRIX_
MULTIPLY
16Binary operations take the top two values from the stack, perform an operation on them, then push the result onto the stack. This one performs matrix multiplication (the @ syntax, new in 3.5).
INPLACE_MATRIX_
MULTIPLY
17Performs an inplace matrix multiplication.
BINARY_POWER19
BINARY_MULTIPLY20
BINARY_MODULO22
BINARY_ADD23
BINARY_SUBTRACT24
BINARY_SUBSCR25
BINARY_FLOOR_DIVIDE26
BINARY_TRUE_DIVIDE27
INPLACE_FLOOR_DIVIDE28
INPLACE_TRUE_DIVIDE29
GET_AITER50
GET_ANEXT51
BEFORE_ASYNC_WITH52
BEGIN_FINALLY53
END_ASYNC_FOR54
INPLACE_ADD55
INPLACE_SUBTRACT56
INPLACE_MULTIPLY57
INPLACE_MODULO59
STORE_SUBSCR60
DELETE_SUBSCR61
BINARY_LSHIFT62
BINARY_RSHIFT63
BINARY_AND64
BINARY_XOR65
BINARY_OR66
INPLACE_POWER67
GET_ITER68
GET_YIELD_FROM_ITER69
PRINT_EXPR70
LOAD_BUILD_CLASS71
YIELD_FROM72
GET_AWAITABLE73
INPLACE_LSHIFT75
INPLACE_RSHIFT76
INPLACE_AND77
INPLACE_XOR78
INPLACE_OR79
WITH_CLEANUP_
START
81
WITH_CLEANUP_
FINISH
82
RETURN_VALUE83
IMPORT_STAR84
SETUP_ANNOTATIONS85
YIELD_VALUE86
POP_BLOCK87
END_FINALLY88
POP_EXCEPT89
STORE_NAME90All opcodes from here on have arguments. This operation pops the top value from the stack and associates it with a name in the names list. The argument is the index of the name.
DELETE_NAME91Deletes the association between a name and a value. The argument is the index of the name.
UNPACK_SEQUENCE92The argument is the number of tuple items.
FOR_ITER93The argument is a relative jump.
UNPACK_EX94
STORE_ATTR95The argument is the index of a name in the names list.
DELETE_ATTR96The argument is the index of a name in the names list.
STORE_GLOBAL97The argument is the index of a name in the names list.
DELETE_GLOBAL98The argument is the index of a name in the names list.
LOAD_CONST100Push a constant onto the stack. The argument is the index of a constant in the consts list.
LOAD_NAME101Push the value associated with a name onto the stack. The argument is the index of a name in the names list.
BUILD_TUPLE102Pop the top n values from the stack and build a tuple from them. The argument is the number of tuple items to pop. Push the resulting tuple onto the stack.
BUILD_LIST103Pop the top n values from the stack and build a list from them. The argument is the number of list items to pop. Push the resulting tuple onto the stack.
BUILD_SET104Pop the top n values from the stack and build a set from them. The argument is the number of set items to pop. Push the resulting set onto the stack.
BUILD_MAP105Pop the top n values from the stack and build a map (dictionary) from them. The argument is the number of dict entries. Push the resulting dict onto the stack.
LOAD_ATTR106The argument is the index of a name in the names list.
COMPARE_OP107
IMPORT_NAME108
IMPORT_FROM109
JUMP_FORWARD110The argument is the number of bytes to skip (a relative jump).
JUMP_IF_FALSE_
OR_POP
111The argument is the byte index to jump to (an absolute jump).
JUMP_IF_TRUE_
OR_POP
112The argument is the byte index to jump to (an absolute jump).
JUMP_ABSOLUTE113The argument is the byte index to jump to (an absolute jump).
POP_JUMP_IF_FALSE114The argument is the byte index to jump to (an absolute jump).
POP_JUMP_IF_TRUE115The argument is the byte index to jump to (an absolute jump).
LOAD_GLOBAL116The argument is the index of a name in the names list.
SETUP_FINALLY122The argument is the number of bytes to jump (a relative jump).
LOAD_FAST124The argument is the local variable number.
STORE_FAST125The argument is the local variable number.
DELETE_FAST126The argument is the local variable number.
RAISE_VARARGS130The argument is the number of raise arguments (1, 2, or 3).
CALL_FUNCTION131
MAKE_FUNCTION132
BUILD_SLICE133
LOAD_CLOSURE135
LOAD_DEREF136
STORE_DEREF137
DELETE_DEREF138
CALL_FUNCTION_KW141
CALL_FUNCTION_EX142
SETUP_WITH143
LIST_APPEND145
SET_ADD146
MAP_ADD147
LOAD_CLASSDEREF148
EXTENDED_ARG144Note: this one is out of order in opcode.py, so I’ve listed it out of order here, too.
BUILD_LIST_UNPACK149
BUILD_MAP_UNPACK150
BUILD_MAP_UNPACK
_WITH_CALL
151
BUILD_TUPLE_UNPACK152
BUILD_SET_UNPACK153
SETUP_ASYNC_WITH154
FORMAT_VALUE155
BUILD_CONST_KEY_MAP156
BUILD_STRING157
BUILD_TUPLE_UNPACK_
WITH_CALL
158
LOAD_METHOD160
CALL_METHOD161
CALL_FINALLY162
POP_FINALLY163
CategoriesPython

Getting Started with DeepSpeech on AWS

Recently, I’ve been working on a project using Python + DeepSpeech. I will share some considerations for setting this type of project up on AWS, including which instance types to look at, in this article. It took quite a bit of trial and error to figure out which one would work best!

What Is DeepSpeech?

DeepSpeech is a speech-to-text engine + model. In other words, it comes with everything you need to get started transcribing audio files to text.

It comes with Python bindings and a client, which you can use as a command line utility, or as an example of how to write your own Python program that uses DeepSpeech.

There are some limitations: the model requires WAV audio at 16,000 hz. The client can use Sox to resample to 16,000 hz if required, but it’s up to you to make sure the file is in the WAV format. My project uses Pydub to handle preprocessing audio files.

DeepSpeech on AWS?

There are a few considerations when putting a DeepSpeech project on AWS EC2. At minimum, you need the right CPU and enough memory.

The main requirement of the CPU is that it support the AVX instruction set, which rules out several instance types. Even on those that do support the instruction set, you need to make sure to use HVM AMI in order to access it.

Beyond that, it’s helpful to note that DeepSpeech will only use one core of the CPU, so using an instance with a lot of cores will only help if transcribing multiple files in parallel.

Memory requirements are also important in the consideration, and depend on the size of the audio files. If you’re only transcribing small files, it shouldn’t be an issue. Trying to work with 30-45 minute-long recordings has required some working around to keep the memory usage reasonable, especially in the preprocessing area.

So, what instance type am I using? Right now, it’s a t3.small. It has the right kind of CPU and enough memory to do the preprocessing and transcribe small chunks at a time. However, I would need more memory if trying to transcribe a large audio file straight through.

If I were putting this in production, though, I think I would split preprocessing and transcribing and put the former on a C5 instance and the latter on either a C5 or P3 instance, after testing to see which works best for the requirements.

After picking an instance, installation is fairly easy. Just follow the instructions and it should work fine.

So, AWS experts, did I overlook an option that would suit DeepSpeech even better? Let me know!

–Harrison

CategoriesPython

Measuring Python Performance Using Py-Spy

When optimizing the performance of a program, it’s essential to test and measure what the bottlenecks are. Programmers are bad at guessing what part of a program will be the slowest. Trying to guess is likely to lead to sacrificing code readability for uncertain gains, or even losses in performance (Code Complete, 2nd Edition page 594).

In my case, I have a starry sky generator that I wanted to improve the performance of. The goal: to allow people to generate bigger images in a reasonable amount of time. So, how can we find out where improvements need to be made?

Enter Py-Spy

Py-Spy is a tool for profiling a Python program. When I was first optimizing the performance of my program, I used PyFlame, but that project is no longer maintained. Py-Spy does everything PyFlame did and more. Another nice bonus is that it isn’t limited to Linux. On top of all that, it’s also installable through pip, so it seems to be a big win!

To install it, just run pip install py-spy.

Py-Spy has a number of commands and options we can use to customize the output. For one, we can attach it to an already-running process (for example, a production web server that’s having issues we want to diagnose) using -p PID. This method will probably require you to run it as root (sudo) so it can access the memory of the other process. The method I will be using is to pass py-spy the command to start the Python program itself, which will look like py-spy [command] -- python stars.py.

Speaking of commands, there are three available: record, top, and dump. Top is an interesting one: it looks like the unix top command, but instead shows data about which functions your program is spending the most time in. Dump just prints out the current call stack, and can only be used in by attaching to an already-running process. This is useful if your program is getting hung up somewhere, and you want to find out where.

For our purposes, though, the record command is the most useful. It comes with various options.

Record’s Options

$ py-spy record --help
py-spy-record 
Records stack trace information to a flamegraph, speedscope or raw file

USAGE:
    py-spy record [OPTIONS] --output <filename> --pid <pid> [python_program]...

OPTIONS:
    -p, --pid <pid>              PID of a running python program to spy on
    -o, --output <filename>      Output filename
    -f, --format <format>        Output file format [default: flamegraph]  [possible values:
                                 flamegraph, raw, speedscope]
    -d, --duration <duration>    The number of seconds to sample for [default: unlimited]
    -r, --rate <rate>            The number of samples to collect per second [default: 100]
    -s, --subprocesses           Profile subprocesses of the original process
    -F, --function               Aggregate samples by function name instead of by line number
    -g, --gil                    Only include traces that are holding on to the GIL
    -t, --threads                Show thread ids in the output
    -i, --idle                   Include stack traces for idle threads
    -n, --native                 Collect stack traces from native extensions written in Cython, C
                                 or C++
        --nonblocking            Don't pause the python process when collecting samples. Setting
                                 this option will reduce the perfomance impact of sampling, but
                                 may lead to inaccurate results
    -h, --help                   Prints help information
    -V, --version                Prints version information

ARGS:
    <python_program>...    commandline of a python program to run

There are a few options that are particularly noteworthy here. --format lets us pick between flamegraph, raw, or speedscope. We’ll be using flamegraph, but speedscope is interesting, too. You can examine a speedscope file using the webapp.

--function will group the output by function, rather than by line number. Both have pros and cons. Grouping by function is helpful to get an easier-to-understand overview, while grouping by line number can help you narrow it down further.

Finally, --rate tells py-spy how many times per second to sample the program. The default is 100, but I’ve found that adjusting this either up or down can help, especially if there are a lot of small, quick functions (or lines) that add up. It doesn’t hurt to play around with this and compare the resulting flamegraphs to see which seem the most useful.

Now, we can generate a flamegraph of the starry sky generator. I’ll be running py-spy -o profile.svg --function -- python stars.py on this commit, modified to generate one image with the dimensions (2000, 2000).

Reading a Flamegraph

Here it is!

py-spy cast (stars.py) (957 samples, 24.22%)cast (stars.py)randint (random.py) (690 samples, 17.46%)randint (random.py)randrange (random.py) (561 samples, 14.20%)randrange (random.py)_randbelow_with_getrandbits (random.py) (242 samples, 6.13%)_randbel..generate_star_pixel (stars.py) (668 samples, 16.91%)generate_star_pixel (stars..planck (stars.py) (397 samples, 10.05%)planck (stars.p..generate_sky_pixel (stars.py) (2,946 samples, 74.56%)generate_sky_pixel (stars.py)randint (random.py) (304 samples, 7.69%)randint (ra..randrange (random.py) (260 samples, 6.58%)randrange.._randbelow_with_getrandbits (random.py) (112 samples, 2.83%)_r..<module> (stars.py) (3,927 samples, 99.39%)<module> (stars.py)generate_sky (stars.py) (3,889 samples, 98.43%)generate_sky (stars.py)putpixel (PIL/Image.py) (668 samples, 16.91%)putpixel (PIL/Image.py)load (PIL/Image.py) (259 samples, 6.56%)load (PIL..all (3,951 samples, 100%)

When running it yourself, you’ll get an SVG file that, when opened in your browser, will be bigger, easier to read, and include some nice JavaScript features like being able to click on a block to zoom in on it, and search. For the moment, take some time to explore the above SVG—hover over a block to get the full text and percentage.

The graph is read from top to bottom. So, in this case, all of the time was spent in the module stars.py. Underneath that is the call to generate_sky, which also basically takes up all of the time. From there, things get more interesting. A portion of the time is taken up just by generate_sky (the part that doesn’t have any blocks beneath it), most of it is taken up by generate_sky_pixel, and some is used by putpixel.

Note that this isn’t grouped by time, but by function. These functions are called one-by-one, so if it were grouped by time, it would be a tiny block for generate_sky_pixel, then a tiny block for putpixel and so on several thousand times.

Since it’s grouped by function, we can more easily compare overall how much time is spent in a particular function versus another. At a glance, we can see that much more time is spent generating the pixel vs. putting it into the image.

A lot of time in generate_sky_pixel isn’t taken up by a sub function, but a fairly significant amount is still used by cast and others.

Let’s get a new graph, but grouped by line number instead of function: py-spy record -o profile.svg -- python stars.py

py-spy Reset Zoom generate_sky (stars.py:132) (51 samples, 1.24%)generate_sky_pixel (stars.py:101) (124 samples, 3.02%)gen..generate_sky_pixel (stars.py:102) (58 samples, 1.41%)cast (stars.py:78) (47 samples, 1.15%)generate_sky_pixel (stars.py:103) (621 samples, 15.15%)generate_sky_pixel (sta..cast (stars.py:79) (469 samples, 11.44%)cast (stars.py:79)randint (random.py:248) (360 samples, 8.78%)randint (ran..randrange (random.py:224) (182 samples, 4.44%)randr..generate_sky_pixel (stars.py:109) (81 samples, 1.98%)g..cast (stars.py:78) (64 samples, 1.56%)generate_sky_pixel (stars.py:110) (603 samples, 14.71%)generate_sky_pixel (st..cast (stars.py:79) (439 samples, 10.71%)cast (stars.py:7..randint (random.py:248) (355 samples, 8.66%)randint (ran..randrange (random.py:224) (159 samples, 3.88%)rand..generate_sky_pixel (stars.py:111) (345 samples, 8.41%)generate_sky..randint (random.py:248) (286 samples, 6.98%)randint (..randrange (random.py:224) (152 samples, 3.71%)rand..generate_sky_pixel (stars.py:115) (181 samples, 4.41%)gener..planck (stars.py:71) (55 samples, 1.34%)generate_star_pixel (stars.py:87) (205 samples, 5.00%)genera..planck (stars.py:72) (93 samples, 2.27%)p..planck (stars.py:71) (44 samples, 1.07%)generate_star_pixel (stars.py:88) (150 samples, 3.66%)gene..planck (stars.py:72) (69 samples, 1.68%)planck (stars.py:71) (51 samples, 1.24%)generate_star_pixel (stars.py:89) (179 samples, 4.37%)gener..planck (stars.py:72) (87 samples, 2.12%)p..generate_star_pixel (stars.py:91) (101 samples, 2.46%)ge..generate_sky_pixel (stars.py:117) (719 samples, 17.54%)generate_sky_pixel (stars.p..generate_sky_pixel (stars.py:119) (85 samples, 2.07%)g..generate_sky_pixel (stars.py:120) (54 samples, 1.32%)generate_sky_pixel (stars.py:121) (52 samples, 1.27%)putpixel (PIL/Image.py:1678) (41 samples, 1.00%)load (PIL/Image.py:796) (79 samples, 1.93%)l..putpixel (PIL/Image.py:1680) (355 samples, 8.66%)putpixel (PI..load (PIL/Image.py:818) (130 samples, 3.17%)loa..putpixel (PIL/Image.py:1686) (56 samples, 1.37%)<module> (stars.py:145) (4,026 samples, 98.20%)<module> (stars.py:145)generate_sky (stars.py:140) (3,958 samples, 96.54%)generate_sky (stars.py:140)putpixel (PIL/Image.py:1692) (235 samples, 5.73%)putpixe..all (4,100 samples, 100%)

There’s a lot more information in this graph. For example, it calls attention to the fact that cast is called in two different lines in generate_sky_pixel. The time spent in generate_star_pixel is pretty evenly distributed between lines 87, 88, and 89–which makes sense, because those are the three lines that call planck.

There’s one more piece of information that will be useful: the total time it takes to generate an image. The flamegraph tells us what percentage of time each function/line takes, but it isn’t meant to measure the total run time. I created a performance.py file which uses timeit to generate ten images with the dimensions (900, 900) and return the average number of seconds it took per image. In this case, it took 7.64 seconds. We can definitely do better.

Tuning the Performance

Now that we have the information from these two graphs, as well as the run time, we can get to work making it faster. Looking again at the first flamegraph above, it seems putpixel uses up a total of ~17% of the run time. The docs specifically warn that putpixel is relatively slow, so it seems like this should be a pretty easy win.

I experimented with several methods, including storing the data as a list of lists of tuples, then converting to a numpy array, then feeding that to Image.fromarray, with a result of 7.1 seconds, about a 7% savings. As you might imagine, this still wasn’t very good.

The natural progression from there seemed to be to skip the lists altogether and start with a numpy array, filled with zeroes initially. For some reason, this was actually slower than putpixel: 7.89 seconds. I’m not a NumPy expert, so I’m not sure why this is. Perhaps mutating is a slow operation for NumPy, or maybe I was just doing it the wrong way. If someone takes a look and wants to let me know, I’d be happy to learn about this.

After that, I tried building up a bytearray, extending it by the three pixels each time they were generated, then converting that to bytes and passing it to Image.frombytes(). Total run time: 6.44 seconds. That’s about a 17% savings over putpixel.

Here’s what our flamegraph looks like now that we’ve settled on a putpixel replacement (and after splitting the bytearray.extend onto its own line, so that it will show up separately):

py-spy Reset Zoom generate_sky (stars.py:144) (39 samples, 1.21%)generate_sky_pixel (stars.py:110) (109 samples, 3.37%)gen..generate_sky_pixel (stars.py:111) (52 samples, 1.61%)cast (stars.py:87) (43 samples, 1.33%)generate_sky_pixel (stars.py:112) (580 samples, 17.95%)generate_sky_pixel (stars.py..cast (stars.py:88) (431 samples, 13.34%)cast (stars.py:88)randint (random.py:248) (327 samples, 10.12%)randint (random..randrange (random.py:224) (173 samples, 5.35%)randran..generate_sky_pixel (stars.py:118) (94 samples, 2.91%)ge..cast (stars.py:87) (45 samples, 1.39%)generate_sky_pixel (stars.py:119) (584 samples, 18.07%)generate_sky_pixel (stars.py..cast (stars.py:88) (428 samples, 13.24%)cast (stars.py:88)randint (random.py:248) (322 samples, 9.96%)randint (rando..randrange (random.py:224) (159 samples, 4.92%)randra..generate_sky_pixel (stars.py:120) (365 samples, 11.29%)generate_sky_pixe..randint (random.py:248) (292 samples, 9.03%)randint (rand..randrange (random.py:224) (132 samples, 4.08%)rand..generate_sky_pixel (stars.py:124) (136 samples, 4.21%)gener..generate_star_pixel (stars.py:100) (112 samples, 3.47%)gen..planck (stars.py:80) (67 samples, 2.07%)p..generate_star_pixel (stars.py:96) (214 samples, 6.62%)generate_..planck (stars.py:81) (94 samples, 2.91%)pl..generate_star_pixel (stars.py:97) (137 samples, 4.24%)gener..planck (stars.py:81) (62 samples, 1.92%)p..planck (stars.py:80) (41 samples, 1.27%)generate_sky_pixel (stars.py:126) (740 samples, 22.90%)generate_sky_pixel (stars.py:126)generate_star_pixel (stars.py:98) (171 samples, 5.29%)genera..planck (stars.py:81) (82 samples, 2.54%)pl..generate_sky_pixel (stars.py:128) (60 samples, 1.86%)g..generate_sky_pixel (stars.py:129) (58 samples, 1.79%)g..generate_sky_pixel (stars.py:130) (47 samples, 1.45%)generate_sky (stars.py:153) (2,967 samples, 91.80%)generate_sky (stars.py:153)generate_sky (stars.py:154) (192 samples, 5.94%)generate..<module> (stars.py:161) (3,225 samples, 99.78%)<module> (stars.py:161)all (3,232 samples, 100%)

Line 154 (bytearray.extend(pixels)) now only took up about 6% of the time. Even on a small image of 900 by 900 pixels, this resulted in a savings of over a second per image. For bigger images, this savings is in the range of several seconds.

Everything else in the program is directly related to image generation and the math and random number generation behind that, so assuming all of that is already optimal (spoiler alert: it isn’t, the cast() function was entirely unnecessary), this is about as fast as the program can get.

Conclusion

Flamegraphs and the profilers that generate them are useful tools for understanding the performance of a program. Using them, you can avoid trying to guess where bottlenecks are and potentially doing a lot of work for little gain.

For further reading, I recommend this article about the reasoning behind the creation of flamegraphs and the problem they were trying to solve. If you’re struggling to understand how to read the graph, it may help more than my explanation.

Now, go forth and conquer your performance problems!

CategoriesPythonTesting

Xonsh + Mut.py: Filtering Mut.py’s Output

Mut.py is a useful tool for performing mutation testing on Python programs. If you want to learn more about that, see my blog post over on the PyBites blog. In short, mutation testing helps us test our tests to make sure that they cover the program completely by making small changes to the code and then rerunning the tests for each change.

That’s useful information, but sometimes the output can be a bit too much. As an example, let’s set up a small program and some tests.

#example.py
import math

def check_prime(number):
    if number < 2: return False

    for i in range(2, int(math.sqrt(number)) + 1):
            if number % i == 0:
                break
    else:
        return True

    return False
#test_example.py
import pytest

from example import check_prime

def test_prime():
    assert check_prime(7)

The test passes! However, it should be obvious that simply making sure that our function can tell that 7 is prime isn’t enough to cover all its functionality. Let’s see what mut.py has to say about this.

$ mut.py --target example --unit-test test_example --runner pytest                                                                                          
[*] Start mutation process:
   - targets: example
   - tests: test_example
[*] 1 tests passed:
   - test_example [0.55317 s]
[*] Start mutants generation and execution:
   - [#   1] AOR example: [0.06451 s] survived
   - [#   2] AOR example: [0.05991 s] survived
   - [#   3] BCR example: [0.06143 s] survived
   - [#   4] COI example: [0.11853 s] killed by test_example.py::test_prime
   - [#   5] COI example: [0.11193 s] killed by test_example.py::test_prime
   - [#   6] CRP example: [0.06105 s] survived
   - [#   7] CRP example: [0.06164 s] survived
   - [#   8] CRP example: [0.06224 s] survived
   - [#   9] CRP example: [0.11637 s] killed by test_example.py::test_prime
   - [#  10] ROR example: [0.11332 s] killed by test_example.py::test_prime
   - [#  11] ROR example: [0.06272 s] survived
   - [#  12] ROR example: [0.11869 s] killed by test_example.py::test_prime
[*] Mutation score [1.67688 s]: 41.7%
   - all: 12
   - killed: 5 (41.7%)
   - survived: 7 (58.3%)
   - incompetent: 0 (0.0%)
   - timeout: 0 (0.0%)

Looks like there’s still a lot of work to be done. Seven out of the twelve mutants survived the test, but this doesn’t tell us anything about where the coverage is lacking. Let’s try adding the -m flag to mut.py.

mut.py --target example --unit-test test_example --runner pytest -m                                                                                       
[*] Start mutation process:
   - targets: example
   - tests: test_example
[*] 1 tests passed:
   - test_example [0.57480 s]
[*] Start mutants generation and execution:
   - [#   1] AOR example: 
--------------------------------------------------------------------------------
   2: 
   3: def check_prime(number):
   4:     if number < 2:
   5:         return False
-  6:     for i in range(2, int(math.sqrt(number)) + 1):
+  6:     for i in range(2, int(math.sqrt(number)) - 1):
   7:         if number % i == 0:
   8:             break
   9:     else:
  10:         return True
--------------------------------------------------------------------------------
[0.07507 s] survived
   - [#   2] AOR example: 
--------------------------------------------------------------------------------
   3: def check_prime(number):
   4:     if number < 2:
   5:         return False
   6:     for i in range(2, int(math.sqrt(number)) + 1):
-  7:         if number % i == 0:
+  7:         if number * i == 0:
   8:             break
   9:     else:
  10:         return True
  11:     
--------------------------------------------------------------------------------
[0.17719 s] survived
   - [#   3] BCR example: 
--------------------------------------------------------------------------------
   4:     if number < 2:
   5:         return False
   6:     for i in range(2, int(math.sqrt(number)) + 1):
   7:         if number % i == 0:
-  8:             break
+  8:             continue
   9:     else:
  10:         return True
  11:     
  12:     return False
--------------------------------------------------------------------------------
[0.05884 s] survived
   - [#   4] COI example: 
--------------------------------------------------------------------------------
   1: import math
   2: 
   3: def check_prime(number):
-  4:     if number < 2:
+  4:     if not (number < 2):
   5:         return False
   6:     for i in range(2, int(math.sqrt(number)) + 1):
   7:         if number % i == 0:
   8:             break
--------------------------------------------------------------------------------
[0.11854 s] killed by test_example.py::test_prime
   - [#   5] COI example: 
--------------------------------------------------------------------------------
   3: def check_prime(number):
   4:     if number < 2:
   5:         return False
   6:     for i in range(2, int(math.sqrt(number)) + 1):
-  7:         if number % i == 0:
+  7:         if not (number % i == 0):
   8:             break
   9:     else:
  10:         return True
  11:     
--------------------------------------------------------------------------------
[0.11615 s] killed by test_example.py::test_prime
[SNIP!]

Suddenly, it’s too much information!

We get the mutation and the context, which can help us pinpoint where tests need to be improved, but since even mutations that were killed show up here, it’s hard to tell at a glance which ones are safe to ignore and which ones to pay attention to.

Let’s pause for a moment here and talk about xonsh’s Callable Aliases. Xonsh, like Bash, has the ability to add aliases for common commands. Unlike Bash, xonsh’s aliases are also the method we can use to access Python functions from subprocess mode.

Aliases are stored in a mapping similar to a dictionary called, aptly, aliases. So we can add an alias by setting a key.

aliases["lt"] = "ls --human-readable --size -1 -S --classify"

Callable aliases extend this idea to form a bridge between a Python function and subprocess mode. Normally, to use anything from Python in subprocess mode would require special syntax. Useful, but limited.

We can define a callable alias just like any Python function. Since our goal is to filter out some of the noise in mut.py’s output, let’s get started on that.

A callable alias can be passed the arguments from the command (as a list of strings), stdin, stdout, and a couple other more obscure values. Our function will need stdin, which means args will also be defined—xonsh determines what values to pass in based on argument position, not the name.

Here’s how to register the alias with xonsh:

#~/.xonshrc
def _filter_mutpy(args, stdin=None):
    if not stdin:
        return "No input to filter"

aliases["filter_mutpy"] = _filter_mutpy
$ filter_mutpy
No input to filter

Success! When called with no stdin, there’s nothing for our function to parse. Xonsh accepts a string as the return value, which is appended to stdout. There are two more optional values that could also be used: stderr and a return code. To use them, just return a tuple like (stdout, stderr) or (stdout, stderr, return code).

Now that we have our alias configured in xonsh, it’s time to add the functionality we want: taming mut.py’s output.

def _filter_mutpy(args, stdin=None):
    if stdin is None:
        return "No input to filter"

    output = []
    mutant = []
    collect_mutant = False
    for line in stdin:
        if " s] " in line and collect_mutant:
            collect_mutant = False
            mutant.append(line)
            if "incompetent" in line or "killed" in line:
                print(mutant[0], end="")
                print(mutant[-1], end="")
            else:
                print("".join(mutant), end="")
            mutant = []
        elif "- [#" in line and not collect_mutant:
            collect_mutant = True
            mutant.append(line)
        elif collect_mutant:
            mutant.append(line)
        else:
            print(line, end="")

aliases["filter_mutpy"] = _filter_mutpy

Now we can pipe mut.py into our alias and get this result:

$ mut.py --target example --unit-test test_example --runner pytest -m | filter_mutpy                                                                        
[*] Start mutation process:
   - targets: example
   - tests: test_example
[*] 2 tests passed:
   - test_example [0.52779 s]
[*] Start mutants generation and execution:
   - [#   1] AOR example: 
[0.12564 s] killed by test_example.py::test_not_prime
   - [#   2] AOR example: 
[0.12044 s] killed by test_example.py::test_not_prime
   - [#   3] BCR example: 
[0.12248 s] killed by test_example.py::test_not_prime
   - [#   4] COI example: 
[0.12042 s] killed by test_example.py::test_prime
   - [#   5] COI example: 
[0.11927 s] killed by test_example.py::test_prime
   - [#   6] CRP example: 
--------------------------------------------------------------------------------
   1: import math
   2: 
   3: def check_prime(number):
-  4:     if number < 2:
+  4:     if number < 3:
   5:         return False
   6:     for i in range(2, int(math.sqrt(number)) + 1):
   7:         if number % i == 0:
   8:             break
--------------------------------------------------------------------------------
[0.06259 s] survived
   - [#   7] CRP example: 
[0.12793 s] killed by test_example.py::test_not_prime
   - [#   8] CRP example: 
--------------------------------------------------------------------------------
   2: 
   3: def check_prime(number):
   4:     if number < 2:
   5:         return False
-  6:     for i in range(2, int(math.sqrt(number)) + 1):
+  6:     for i in range(2, int(math.sqrt(number)) + 2):
   7:         if number % i == 0:
   8:             break
   9:     else:
  10:         return True
--------------------------------------------------------------------------------
[0.06272 s] survived
   - [#   9] CRP example: 
[0.12543 s] killed by test_example.py::test_prime
   - [#  10] ROR example: 
[0.12325 s] killed by test_example.py::test_prime
   - [#  11] ROR example: 
--------------------------------------------------------------------------------
   1: import math
   2: 
   3: def check_prime(number):
-  4:     if number < 2:
+  4:     if number <= 2:
   5:         return False
   6:     for i in range(2, int(math.sqrt(number)) + 1):
   7:         if number % i == 0:
   8:             break
--------------------------------------------------------------------------------
[0.06679 s] survived
   - [#  12] ROR example: 
[0.12549 s] killed by test_example.py::test_prime
[*] Mutation score [2.04439 s]: 75.0%
   - all: 12
   - killed: 9 (75.0%)
   - survived: 3 (25.0%)
   - incompetent: 0 (0.0%)
   - timeout: 0 (0.0%)

Awesome! Every code snippet is now related to a mutant that survived, so we can see at a glance which ones are important—and I used that to improve the tests, so more cases are covered and more mutants are killed.

This is a relatively simple example of xonsh’s power, but remember that the entire Python standard library and ecosystem is available to parse, filter, and act on the output of any command-line interface.

I’m looking forward to discovering more ways to use callable aliases in my work. Got any ideas?

CategoriesPython

Generating a Starry Sky

Years ago, I discovered netpbm’s ppmforge, one option of which is to generate a starry sky, which fascinated me. The author’s website is full of interesting projects and papers, too. t I set out to rewrite it in Python, which I had just recently started learning at the time. Following is some code from 2012. If you want to compare it to ppmforge, the relevant lines are 63, 440-508. You’ll probably find that my younger self didn’t do a great job at translating everything to Python.

import random
import Image
import cStringIO
 
def cast(low, high):
    arand = (2.0**15.0) - 1.0
    return ((low)+(((high)-(low)) * ((random.randint(0, 12345678) & 0x7FFF) / arand)))
 
def make_star_image(width=600, height=600, star_fraction=3, star_quality=0.5, star_intensity=8, star_tint_exp=0.5, bg=None, lambd=0.0025):
    star_data = []
     
    if bg == None:
        star_image = Image.new("RGB", (width, height))
        for i in range(0, width):
            for l in range(0, height):
                if random.expovariate(1.5) < star_fraction:
                    v = int(star_intensity * ((1 / (1 - cast(0, 0.999))**star_quality)))
                    if v > 255:
                        v = 255
                    star_data.append((v, v, v))
                else:
                    star_data.append((0, 0, 0))
    else:
        index = 0
        if bg.mode != "RGB":
            bg = bg.convert("RGB")
        width, height = bg.size
        star_image = Image.new("RGB", (width, height))
        bg = bg.getdata()
        for i in range(0, width):
            for l in range(0, height):
                r, g, b = bg[index]
                average = (r + b + g) / 3
                r = random.expovariate(lambd)
                if r < average or random.random() > 0.9:
                    v = int(star_intensity * ((1 / (1 - cast(0, 0.999))**star_quality)))
                    if v > 255:
                        v = 255
                    if r < average:
                        if v > 40:
                            v = int(v * 1.5)
                            if v > 255:
                                v = 255
                            star_data.append((v, v, v))
                        elif v < 40 and random.random() > 0.5:
                            star_data.append((v, v, v))
                        else:
                            star_data.append((0, 0, 0))
                    else:
                        star_data.append((v, v, v))
                else:
                    star_data.append((0, 0, 0))
                index += 1
    star_image.putdata(star_data)
    return star_image
 
def main():
    make_star_image(width=1280, height=800, star_quality=1.2, lambd=0.0025, star_intensity=1, star_fraction=1).show()
    #bg = Image.open("/home/harrison/Pictures/wallpapers/bp1.jpg")
    #make_star_image(bg=bg, lambd=0.0035).show()
#    io = cStringIO.cStringIO()
#    starry.save(io, "JPEG")
#    data = io.getvalue()
#    
#    print "content-type: image/jpeg"
#    print
#    print data
 
if __name__ == "__main__":
    main()

The temperature calculation is off, the cast function isn’t quite right, and I didn’t even attempt the blackbody radiation calculations from the planck macro. The list could go on.

My most recent iteration of a starry sky generator is available on Github. This version is based on a deeper understanding of the math involved. For example, it turned out that the cast function was unnecessary, as its functionality is basically already built in to Python’s random module. I’m still working on understanding the planck function, so if you know much about blackbody radiation, I’d be happy to talk to you about it!

I like to compare these two versions of basically the same program, because it illustrates, in my mind, the idea of a Pythonic program. The improvements in the latest version are a result of better understanding both the problem and the solution.

The first version required the cast function because it just copied syntax over from the C program and made it work in Python. After taking the time to understand the problem that function was trying to solve, while also learning the best way to solve that problem in Python, it was able to be replaced entirely.

So, what would have been a troublesome function for a reader of the code to puzzle over, was turned into an easily-understood standard library function call.

Another example is in the control flow. The old version is deeply nested and confusing. The new one breaks more of it out into separate functions, uses clearer variable names, and makes better use of white space for grouping. I think it’s a lot easier to follow, besides looking prettier.

This comes up a lot when I’m solving code challenges on PyBites, too. When you compare solutions, it’s easy to see that some are better than others. Sometimes I’m happy with my solution, but sometimes it leaves a lot to be desired. It depends on how much experience I have with that kind of problem and how much time I put into understanding it.

The way to grow is to keep reading good examples of code and practicing. We can’t master every area, but we can keep improving!

There are several performance-related improvements in the new version, as well. For example, this version uses a bytearray to store the pixels before converting to an Image. In another post, I’ll go through the performance measurements I used to determine that this method is significantly faster than the other options.

I’m sure that there is still a lot that could be better about my latest starry sky generator, but it’s nice to be able to compare and see how much I’ve grown so far.

The Django app is currently hosted on Google AppEngine, so go ahead and check it out!

Or, see this page if you want to use text like the featured image.

CategoriesPython

Introduction to Xonsh

Recently, I got started with xonsh (pronounced “conch”) as a replacement shell for Bash.

What is xonsh, you might ask? Well, basically, it’s a version of Python meant for use as a shell. Since it’s a superset of Python, all Python programs are valid xonsh shell scripts, so you can make use of Python’s standard library and any other Python package you have available.

Probably my favorite feature, though, is being able to transfer my Python knowledge to shell scripting. As the feature comparison puts it, xonsh is a “sane language.”

That means we can do math (and a lot more!) directly in the shell, like so:

$ (5 + 5) ** 5
100000

However, we can also write commands, just like in Bash:

$ curl example.com
<!doctype html>
<html>
<head>
    <title>Example Domain</title>
...

Xonsh handles this by having two modes, which it automatically chooses between for each line. Python mode is the default, but any time a line contains only an expression statement, and the names are not all current variables, it will be interpreted as a command in subprocess mode.

When you install xonsh and run it without a .xonshrc file, you’ll be presented with a welcome screen:

            Welcome to the xonsh shell (0.9.13.dev1)                              

            ~ The only shell that is also a shell ~                              

----------------------------------------------------
xonfig tutorial   ->    Launch the tutorial in the browser
xonfig wizard     ->    Run the configuration wizard and claim your shell 
(Note: Run the Wizard or create a ~/.xonshrc file to suppress the welcome screen)

Going through the wizard will present you with a ton of options. Most will not be necessary to mess with, but there are some useful things like changing the prompt, and various plugins.

That’s just the tip of the iceberg. Xonsh has a lot more to offer, but I’m still exploring the possibilities.

For further reading, the guides cover a lot of topics in-depth.