Integrating vue-cli with Django

For the Airavata Django Portal project I recently worked on updating the javascript build scripts from cobbled together Webpack scripts to using vue-cli. There were several advantages to switching to vue-cli:

  • Less idiosyncratic build configuration for the different Django apps. The UI for the Django Portal is broken into several Django apps, each with their own frontend code and with a package of common frontend code. A couple of these were being built in very different ways since they started from very different Webpack templates.
  • Added functionality like integrated linting on save and Hot Module Replacement (HMR). Getting a Vue.js frontend app to build with Webpack is reasonably doable. But adding additional functionality like HMR requires quite a bit of extra work and that work would have to be replicated, with some adjustments, to each Django app. Using vue-cli allows us to get all of the goodies of modern javascript tooling for free.

In this post I’ll recap the issues I ran into and how I solved them. To see the vue-cli configuration that I ended up with, check out the following in one of the Django apps (in this case, the workspace app):

Getting Started

vue-cli has an easy way to create a project from scratch, but I needed to integrate it with existing Vue.js projects. What I did was generate a dummy project in a completely separate folder and then look at what was generated and copy in the necessary bits to the existing Vue.js projects. Here are some things that were different and needed to be copied over:

  • in our old config we were using .babelrc files. vue-cli generates a babel.config.js file (and you don’t want both of them)
  • from the generated package.json file I copied the scripts, devDependencies, the eslintConfig, postcss, and browserslist

webpack-bundle-tracker and django-webpack-loader

There are two basic approaches one could take to integrate the generated Webpack bundles with the backend Django templates:

  1. Generate Webpack bundles with expected file names (so, no cache-busting hashes) the same way for dev and production modes. This way the path to the generated bundle files is known in advance and can be hardcoded in the Django templates. This is what we were doing in Airavata Django Portal before this integration.
  2. Load the Webpack bundles dynamically. That is, figure out what files were generated for a Webpack bundle and load those. Webpack is free to name the files however it needs to; it can even provide URLs to these files if they are dynamically generated as in the case of the dev server.

With the migration to vue-cli I wanted to get the benefits that come with approach #2. To get #2 to work requires generating bundle metadata and a library to load that metadata. Lucky for me those both already exist. webpack-bundle-tracker is a Webpack plugin that will generate a JSON file with the needed bundle metadata and django-webpack-loader is a Django app that provides template tags that can read the bundle metadata and load the appropriate files.

See the linked vue.config.js file above to see how to integrate webpack-bundle-tracker. And see the linked settings.py file above to see how to integrate django-webpack-loader. Once integrated, the bundles can be loaded in the template. See the base.html file above for an example.

To get the bundle loading to work, however, I do need to generate the same set of files in production and development since I need to know which bundles to load in the Django templates. In vue-cli the dev server mode just generates a single javascript file to be loaded but in production mode there are potentially three files generated, one for vendor code, one for common code (if there are multiple entry points) and one for the entry point’s code (and similarly for CSS code). To do this I ran npx vue inspect --mode production and inspected the production chunk configuration:

...
    splitChunks: {
      cacheGroups: {
        vendors: {
          name: 'chunk-vendors',
          test: /[\/]node_modules[\/]/,
          priority: -10,
          chunks: 'initial'
        },
        common: {
          name: 'chunk-common',
          minChunks: 2,
          priority: -20,
          chunks: 'initial',
          reuseExistingChunk: true
        }
      }
    }
...

and then copied this into the appropriate part of the vue.config.js file (see the linked vue.config.js file above).

Local packages

As mentioned above, there are a couple of common packages that the Vue.js frontend code make use of. One is of common UI code and the other is code for making calls to load data from the REST services. These are linked into the Vue.js projects via relative links in the dependencies section of the package.json file:

{
...
  "dependencies": {
...
    "django-airavata-api": "file:../api",
    "django-airavata-common-ui": "file:../../static/common",
...
  },
...
}

For reasons that aren’t entirely clear to me, this caused problems with vue-cli. When running ESLint, for example, vue-cli would complain that it couldn’t find the ESLint config file for these relatively linked packages. I got a similar problem with PostCSS. This comment on issue #2539 gave me the config I needed to force using the project’s ESLint and PostCSS config:

const path = require('path');
module.exports = {
  chainWebpack: config => {
    config.module
      .rule('eslint')
      .use('eslint-loader')
      .tap(options => {
        options.configFile = path.resolve(__dirname, ".eslintrc.js");
        return options;
      })
  },
  css: {
    loaderOptions: {
      postcss: {
        config:{
          path:__dirname
        }
      }
    }
  }
}

Hot Module Replacement

To get HMR working I needed to have the following configuration to allow loading the JS and CSS files from the dev server on a separate port (9000), since I also have the Django server running on localhost on another port (8000):

  devServer: {
    port: 9000,
    headers: {
      "Access-Control-Allow-Origin": "*"
    },
    hot: true,
    hotOnly: true
  }

Other changes

vue-cli doesn’t include the template compiler in the bundle so the entry point cannot include a template string. This meant I needed to change the entry point code to use a render function instead of a template string. For example, instead of

import Vue from 'vue'
import BootstrapVue from 'bootstrap-vue'
import ViewExperimentContainer from './containers/ViewExperimentContainer.vue'

// This is imported globally on the website so no need to include it again in this view
// import 'bootstrap/dist/css/bootstrap.css'
import 'bootstrap-vue/dist/bootstrap-vue.css'

Vue.use(BootstrapVue);

new Vue({
  el: '#view-experiment',
  template: '<view-experiment-container :initial-full-experiment-data="fullExperimentData" :launching="launching"></view-experiment-container>',
  data () {
      return {
          fullExperimentData: null,
          launching: false,
      }
  },
  components: {
      ViewExperimentContainer,
  },
  beforeMount: function () {
      this.fullExperimentData = JSON.parse(this.$el.dataset.fullExperimentData);
      if ('launching' in this.$el.dataset) {
          this.launching = JSON.parse(this.$el.dataset.launching);
      }
  }
})

I needed this essentially equivalent code that uses a render function instead:

import Vue from "vue";
import BootstrapVue from "bootstrap-vue";
import ViewExperimentContainer from "./containers/ViewExperimentContainer.vue";

// This is imported globally on the website so no need to include it again in this view
// import 'bootstrap/dist/css/bootstrap.css'
import "bootstrap-vue/dist/bootstrap-vue.css";

Vue.use(BootstrapVue);

new Vue({
  render(h) {
    return h(ViewExperimentContainer, {
      props: {
        initialFullExperimentData: this.fullExperimentData,
        launching: this.launching
      }
    });
  },
  data() {
    return {
      fullExperimentData: null,
      launching: false
    };
  },
  beforeMount() {
    this.fullExperimentData = JSON.parse(this.$el.dataset.fullExperimentData);
    if ("launching" in this.$el.dataset) {
      this.launching = JSON.parse(this.$el.dataset.launching);
    }
  }
}).$mount("#view-experiment");

One thing I learned in this process is that the vue-template-compiler version needs to be the same as the version of Vue.js, otherwise you get an error like this:

Module build failed (from ./node_modules/vue-loader/lib/index.js):
Error: [vue-loader] vue-template-compiler must be installed as a peer dependency, or a compatible compiler implementation must be passed via options.
    at loadTemplateCompiler (/Users/machrist/Airavata/django/django_airavata_gateway/django_airavata/apps/dataparsers/node_modules/vue-loader/lib/index.js:21:11)
    at Object.module.exports (/Users/machrist/Airavata/django/django_airavata_gateway/django_airavata/apps/dataparsers/node_modules/vue-loader/lib/index.js:65:35)

Just make sure you reference the same version of both as dependencies in package.json.

Conclusion

The dev experience is now better than ever. Just start up the Python server

source venv/bin/activate
python manage.py runserver

Then navigate to the Django app folder and run

npm run serve

Now we have hot module replacement and linting on save.

I think some improvements can still be made. For one, there is still a good bit of boilerplate config that is needed for each Django app. It would be good if it could be shared. Also, I investigated whether there was a webpack-bundle-tracker vue-cli plugin. Turns out there are two, but they don’t quite do what I want. Maybe I’ll make a third one? 🙂

Resources that helped me

How to create a VirtualBox VM with a static IP and internet access

Introduction

Recently I’ve been working on installing Apache Airavata in a VirtualBox VM running on my laptop using our “standalone” Ansible installation settings. The goal is to have a locally running instance of Airavata that I can connect to when developing the Airavata Django Portal which I’ve been working on. That means I need Django running on my laptop to be able to access the VM (host-to-guest access) and the VM does need to be able to access the internet (guest-to-internet access) since the Ansible playbooks that are executed against the VM download and install software from the internet.

It turns out that getting this set up is not so trivial, but also, it’s not that hard once you know what VirtualBox provides and how to configure it. In summary, the approach I’ll give here is to create a VirtualBox VM:

  • with the default NAT network adapter (for internet access)
  • and then add a host-only network adapter and configure the VM with a static IP address (for host-to-guest access)

A quick word about VirtualBox networking modes. You can read all about the various networking modes here but here’s a quick summary:

  • NAT – the networking mode of the default network adapter when you create a new VM. This gives internet access but applications running on the host can’t make network connections to the VM.
  • Bridged – with this mode VirtualBox uses a special driver for the host’s physical network interface to create a virtual network interface for the VM. The VM gets an IP on the same network that the host is physically connected to. Host-to-guest communication and internet access are available.
  • Host-only – with this mode VirtualBox creates a virtual network that the host and the VMs are connected to. This allows host-to-guest communication but this virtual network has no access to the internet.

Now you might be wondering, why not just use a bridged network adapter? Well, you can, but there is one substantial downside. Whenever the network the host is connected to changes, the IP address of the VM will change. This is exacerbated in my case by the fact that I exclusively use wireless networks on my laptop, so my network is regularly changing. Also, I really need a static IP address for the VM to configure the Ansible scripts and because part of the process is to generate a self-signed SSL certificate for the VM’s IP address. But, if you’re using a wired workstation or you don’t have a lot of configuration dependent on the VM’s IP address, bridged networking might be a good solution to get you both internet access and host-to-guest networking.

Installing CentOS 7

Creating a CentOS 7 VM is covered well in other places (I used Jeramy Singleton’s guide), so I won’t cover all of the steps here. But here are some quick pointers:

  • Set the type of the VM to Linux and the version to Red Hat (64-bit)
  • Download a minimal ISO from https://www.centos.org/download/
  • Log in as root and change the working directory to /etc/sysconfig/network-scripts/ and edit the ifcfg-enp0s3 config file and set ONBOOT to yes. Then reboot the VM to get network access.

Also note that whereas in Jeramy Singleton’s instructions he has you create a port forward (2222->22) to be able to SSH into the VM, in the following we’ll add a host-only network instead and use that IP address to SSH into the VM on the standard port 22.

Configuring host-only network

First, make sure that there is a host-only network to connect to. In my case, a default one was already created, called vboxnet0. To check if you already have one, start VirtualBox and then click on the Global Tools button and make sure you are on the Host Manager Network tab.

host-only network details in the Host Network Manager

Take note of the IP Address of the network and the network mask. In the screenshot above, the IP Address is 192.168.99.1 with network mask of 255.255.255.0 which means I can assign IP addresses 192.168.99.2-254 statically. I’ve disabled the DHCP server since I’ll assign IP addresses statically, but in theory you utilize static and dynamic IP assignment (if you do that note that the DHCP server will hand out IP addresses from 100-254 by default, so don’t use those).

Now add a host-only network adapter to the VM. First, make sure that the VM is shut down. Next, in the VirtualBox app select the VM and click on the Settings button. Click on the Network tab. Adapter 1 should be your NAT adapter. Click on the Adapter 2 subtab, select Host-only Adapter and the name of the host-only network (vboxnet0 in this case).

Adding a Host-only adapter to the VM

Click OK and start up the VM. Log in as root through VirtualBox console. Run

$ ip addr

to find the name of the host-only network interface. In my case it was called enp0s8.

$ ip addr

Create a file called ifcfg-enp0s8 in /etc/sysconfig/network-scripts/ and give it the following contents:

DEVICE=enp0s8
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.99.10
NETMASK=255.255.255.0

Where NETMASK should match the settings for your host-only network as obtained above and IPADDR should be an available IP address in the host-only network (again, typically in 2-254 range).

Now run

$ systemctl restart network

Now when you run

$ ip addr

you should see the IP address you configured in the ifcfg-enp0s8

“ip addr” shows the IP address for the host-only adapter interface

You should now be able to SSH to the VM from the host OS:

(host OS)
$ ssh root@192.168.99.10

You can now connect applications running on the host OS to network services running on the VM via the host-only network and the VM can also connect to the wider internet via the NAT interface.

Resources

Dynamically including Django apps using entry points

For the Airavata Django Portal project I’ve been looking at how third-party contributors could contribute Django apps that would integrate nicely with the portal. This is not something that Django has built-in support for, as can be seen in Django ticket #29554, but people have come up with workarounds.

The most compelling workaround I found was discussed on the django-developers mailing list and is implemented by the pretix open source application. One clever thing they are doing is using python packaging entry points:
https://packaging.python.org/specifications/entry-points/. The entry points can be dynamically introspected, for example pretix iterates over pretix.plugin entry points in its settings.py.

This can be used in the setup.py of a Django app to register its AppConfig. A Django project can introspect any such Django apps that installed in the current virtual environment and add all of these to the INSTALLED_APPS in settings.py, just like pretix above.

The other things to configure when adding a Django app are to add the app’s urls to the project’s urls.py. Again we can loop over all Django app entry points and get their AppConfig. We can either allow these apps to specify what URL prefix they want, as a property on the AppConfig, or we can just assign a URL for these apps with a common prefix of plugins/ then the app’s label (which must be unique). The urls module of the Django app can be figure out if you have an instance of its AppConfig:

from importlib import import_module
# where app is an AppConfig instance retrieved from django.apps
urls = import_module(".urls", app.name)

In the case of the Airavata Django Portal I want to know a bit more about the apps in order to integrate them into the overall navigation. The following are properties that can be specified on the AppConfig:

  • url_home: This is the home URL of this app. I could default it to just the first URL pattern of the app:
from importlib import import_module
urls = import_module(".urls", app.name)
url_home = urls.app_name + ":" + urls.urlpatterns[0].name
  • app_order: the desired order of the app in the listing. Could default to last if not specified
  • fa_icon_class: FontAwesome icon class to use for this app. Could default to something generic
  • app_description: a description of the app. Could default to just the verbose_name of the app

As indicated in the list above, all of these extra bits of metadata should be optional and be provided as needed.

There’s still the issue of sub-menu items that need to be integrated. Currently apps should inherit from the base.html template and add sub-menu items into a navigation items block that appears on the left side of the portal page. This could perhaps be better implemented as AppConfig metadata.

Speeding up SSH by Reusing Connections

From https://puppet.com/blog/speed-up-ssh-by-reusing-connections:

One way to enable this feature is to add the following to your ~/.ssh/config:

  Host *
      ControlMaster auto
      ControlPath ~/.ssh/sockets/%r@%h-%p
      ControlPersist 600
  

In a quick test with a particular host running ssh user@host whoami takes about 0.8s without that setting and takes about 0.1s with the setting above. Really speeds up bash-completion with scp.

Ajax file upload with fetch and FormData

Uploading files using an Ajax request is easier than ever with a couple of fairly recent additional web APIs, fetch and FormData. I’ll show how to combine these two APIs to upload files using an Ajax request.

If you’re like me you probably have heard of fetch. It is basically a nicer API for creating and dealing with XMLHttpRequests. I’ve only briefly encountered FormData until recently when I worked on the file upload for the upcoming Airavata Django portal. FormData provides an interface for adding key/value pairs as would be submitted by a web form, but it can also handle files that the user has selected to upload.

So let’s say you have an <input id="profile-pic" type="file"></input> element in your page and you want to use Ajax to upload the file when selected by the user. You can upload the file like so

let profilePic = document.querySelector('#profile-pic');
let data = new FormData();
data.append('file', profilePic.value);
let uploadRequest = fetch('/profile/picture-upload', {
    method: 'post',
    body: data,
})
    .then(response => response.json())
    // here you would handle success response and error cases

You can also post additional non-file values, just like in an HTML form. So for example you could do the following:

let profilePic = document.querySelector('#profile-pic');
let data = new FormData();
data.append('file', profilePic.value);
data.append('user-id', userId);
let uploadRequest = fetch('/profile/picture-upload', {
    method: 'post',
    body: data,
})
    .then(response => response.json())
    // here you would handle success response and error cases

fetch automatically chooses the right content-type to use when submitting the request based on the kind of data in the FormData object. As you might know, when you have a form that uploads files you need to set the enctype attribute on the form to multipart/form-data which encodes the form values as multipart MIME parts. fetch applies this encoding and sets the content-type to multipart/form-data, so there is no need to specify it

For Contributors: Managing a fork and keeping it up to date

Let’s say you have forked a repository on GitHub and you’ve started working on a feature or a bug fix that you plan on contributing as a pull request at some point. One issue you’ll run into eventually is how to keep your fork up to date with the upstream repository. Here are some instructions I’ve followed to manage a fork and keep it up to date.

First of all, create a new branch on which to make all of your commits. Theoretically, if you only ever plan on contributing one pull request, you can get away with adding the commits to the main branch (e.g., master). However, it makes it harder to contribute multiple pull requests and it makes it harder to keep that main branch up to date.

Create and switch to a new branch. For the following examples I’ll assume I’ve forked the django repo.

git clone https://github.com/machristie/django.git
cd django
git checkout -b my-branch

After you make some commits and are ready to push them to your fork, do the following:

git push -u origin my-branch

The -u option to git push means to set this branch (my-branch) to track this remote branch (origin/my-branch) as its upstream branch. What this means is that from here on you can simply run git push or git pull and git will push/pull to this remote branch (origin/my-branch). Just to clarify here origin is your fork.

Second, now that you are doing all of your work off the main branches, set the main branches to track the remote branches in the upstream repo. Then you can easily do a git pull to bring your local main branch up to date.

Let’s add the upstream repo as a Git remote.

git remote add upstream https://github.com/django/django.git
git fetch upstream

Now let’s set the master branch to track the remote branch in this upstream repo.

git checkout master
git branch --set-upstream-to=upstream/master

Now we can do a git pull

git pull

The main branch(es) now are set up to sync with the upstream repo while the branches you create are set up to sync with your fork.

By the way, if you want to update a main branch in your forked repository you can do that too. For example, to update the master branch in your fork to bring it up to date with the master branch in the upstream repository you would switch to your master branch, update it with upstream, then push those updates to your fork:

git checkout master
git pull
git push origin master

At some point you’ll want to update your branch so it has the most recent commits from the main branch. For example, let’s say you create my-branch off of the master branch and now you want to update your my-branch with the latest commits on master that have landed in the upstream repository in the meantime. To do this you can first pull in updates for master from the upstream repo and then rebase your branch with master. Here’s how:

git checkout master
git pull
git checkout my-branch
git rebase master

git rebase master rewrites your commits on your branch on top of the latest commits on master.

Of course, when you do this you might run into merge conflicts. When you run git rebase master it will try to re-apply all of your commits, one by one, on top of the last commit on master. If there are changes on master and also on your branch to the same lines of code in a file, git won’t be able to figure out how to automatically merge those changes. The rebase will stop and tell you that there are merge conflicts. Running git status will contain a section called Unmerged paths, something like the following:

Unmerged paths:
  (use "git add <file>..." to mark resolution)

    both modified:      somefile.txt

For each one of the files listed in Unmerged paths:
1. Open the file in a text editor and inspect the conflicting changes. The conflicts are delimited by <<<<<<<, ======= and >>>>>>> markers.
2. Decide which side to keep or manually merge the changes, as needed. Save the file.
3. Run git add path/to/file

Once you’ve resolved all of the merge conflicts, run:

git rebase --continue

Git will continue with the rebase as best it can. It may stop the rebase process multiple times if there are more merge conflicts.

For more information on resolving merge conflicts, see the section on Basic Merging in the Pro Git book.

Mapping API Responses to JavaScript Models

One design goal I’ve been having in mind with frontend code for airavata-django-portal is to keep the “business logic” separate from the UI code, especially to keep it separate from any UI framework code. I got this idea from another blog post that I can’t quite find at the moment but the idea is to keep the core application logic separate from the UI framework code, which makes it easier to switch if needed to a different UI framework. It’s also just a good separation of concerns: there’s no need to tie up pure domain logic with Vue or React or Angular specific framework code.

For the most part, this common application logic code is made up of Models and Services. This post focuses on Models and how to instantiate them from REST API responses. Models are client-side representations of server-side models, so one responsibility of model classes is to handle deserialization and serialization of API responses. Another responsibility of models is to implement validation logic. In general any domain specific behavior should be implemented by these model classes.

So one problem is how to efficiently map from a JSON response to properties on these model classes? I’ve been working on that this week. For the most part JSON.parse takes care of the simpler aspects of deserialization: strings, numbers, and booleans are all converted to native data types. But there remain a few additional considerations:

  • Dates aren’t encoded as such in JSON and require special handling. The approach I’ve been taking is to encode dates as strings in ISO-8601 format, in the UTC timezone.
  • Mapping nested data structures to nested model classes.
  • Handling lists of simple data values or nested model instances.

There are potentially other considerations like mapping a property with a certain name in the response to a differently named property on the model, but we’re taking the approach of keeping the property names the same on the responses and model classes.

I created a BaseModel.js that has a constructor that takes two arguments. The first argument is an array of metadata that defines all of the “fields” or properties of the model class. The second argument is optional and is the data, which would typically be an API response.

There are two ways to define a field. The first way is to just specify the name of the field. In this case no special conversion logic is performed: if the field’s value is a string then that is the type of the field, etc. The second way to define a field is to specify not only the name of the field but also its type and optionally whether it is a list and what default value it should have if there is no value for this field in the data.

The implementation of the BaseModel constructor is fairly straightforward. Here’s an example of how the BaseModel class would be used:

import BaseModel from './BaseModel'
import InputDataObjectType from './InputDataObjectType'
import OutputDataObjectType from './OutputDataTypeObject'


const FIELDS = [
    'applicationInterfaceId',
    'applicationName',
    'applicationDescription',
    {
        name: 'applicationModules',
        type: 'string',
        list: true,
    },
    {
        name: 'applicationInputs',
        type: InputDataObjectType,
        list: true,
    },
    {
        name: 'applicationOutputs',
        type: OutputDataObjectType,
        list: true,
    },
    'archiveWorkingDirectory',
    'hasOptionalFileInputs',
];


export default class ApplicationInterfaceDefinition extends BaseModel {


    constructor(data = {}) {
        super(FIELDS, data);
    }
}

Two-factor Authentication for write-enabled Apache GitHub repos

Some Apache projects, such as Apache Airavata just recently, have made their GitHub repos writeable. However, to actually push to an Apache GitHub repo you need to enable two-factor authentication (2FA) in GitHub. With 2FA some additional work is needed to authenticate with GitHub from a Git client.

First Steps

First thing you need to do is link your Apache account with your GitHub account and enable 2FA on GitHub if you haven’t already done that. Go to https://gitbox.apache.org/setup/ and follow the instructions there.

Using a Personal Access Token

Now that you have 2FA enabled on your GitHub account you can no longer use your GitHub username and password to authenticate with GitHub from a Git client. Instead of your GitHub password you can use a Personal Access Token.

Personal Access Token screen

  1. Generate a Personal Access Token in Github.
  2. Give it a name.
  3. Check the repo scope.
  4. Create the token and copy it. (make sure to securely save this token somewhere, you’ll won’t be able to get it back later)
  5. When doing git push provide the GitHub username and this personal access token.

Also, you’ll want to store this personal access token in a keychain type service so you don’t have to provide it each time you do a push. If you haven’t already done so, configure a credential helper for Git.

Using an SSH Key

Alternatively, you can set up an SSH key for authenticating with GitHub. I didn’t do this but Suresh reported that this works as well.

See GitHub’s documentation on working with SSH keys for more information about this approach. The gist of it is that you can use an SSH key you already have or you’ll need to generate a new one. Then you’ll need to add the public key portion to GitHub. Finally, GitHub has some instructions on how you can automatically add your private key passphrase to your ssh-agent so you don’t need to ever type your private key passphrase.

Reflections on the SGCI Bootcamp

From October 2nd to the 6th I attended the SGCI (Science Gateways Community Institute) Bootcamp, which is a workshop to help science gateway developers and project leaders to develop a sustainability plan for their gateways. The Bootcamp took place at the Purdue Research Park, just a short distance from the Indianapolis Airport.

I went there with Dr. Sudhakar Padminghampton, the PI of the SEAGrid science gateway. As members of the Science Gateways Research Center there are several projects we have to which we could apply the Bootcamp (the Apache Airavata open source project, our SciGaP “Platform-as-a-Service” hosting project, etc.) but we decided to focus on SEAGrid, which is one of the main science gateways we support.

Day 1

We’re close enough that we drive up to the Purdue Research Park on Monday. Michael Zentner kicks things off by giving an overview of the Bootcamp and the goals for the week. We spend a little time introducing ourselves to each other, but we’ll come to know more about each other and our projects later in the week through the various Bootcamp activities.

What is Sustainability?

Nancy Maron leads the first session, What is Sustainability?. Sustainability is defined as “the ability to get the resources (financial and
otherwise) needed to maintain and increase the value of your gateway”. So one question to ask is, what is the value of our gateway, SEAGrid? And the value of SEAGrid is different for different groups of people.

We need to think about that value SEAGrid provides to

  • end users (e.g., gateway users and developers)
  • stakeholders (e.g., project PIs, institutional leadership)
  • partners (e.g., HPC centers, camputs computing centers, other gateway service providers)
  • volunteers (e.g., open source contributors)

We also need to think about our competition, what are the alternatives out there to the services provided by SEAGrid? That will affect the value that SEAGrid provides as well.

Ultimately we’ll want to develop a strong value proposition, which we’ll come back to later. But a strong value proposition needs to articulate the unique value that our service provides that is compelling, over and above the competition, to our user audience.

Key takeaways:

  • must develop a great idea into a strong value proposition
  • the development and implementation of a sustainability plan is an ongoing process

Napkin Drawing

The next session was led by Juliana Casavan. The idea here is to communicate in a non-technical fashion, mostly using pictures, the value of our gateway to end users. The setup is this: if you had to describe your gateway to someone using only a drawing on a napkin (and refraining from using technical jargon), how would you describe your gateway?

I think this exercise was really helpful and it was probably my favorite exercise of the week. We so often think in very technical terms about our gateway and this was a great way to try to see our gateway from the perspective of an end user.

Key takeaway:

  • When describing your gateway (or product, service, whatever), you don’t need to provide all of the details. Leaving out details creates intrigue. Intrigue leads to your audience asking questions so that they are the ones driving the conversation. This is a much more effective way to communicate your value than simple enumerating each and every little thing your gateway does.

Value Proposition

Nancy leads this session on formulating a value proposition. Nancy gave us a very simple template to work from that goes like this:

  • My product
  • will help who?
  • to do what?
  • by how?

Here’s what we came up with for SEAGrid:

  • SEAGrid Science Gateway
  • will help computational scientists and engineers
  • to set up model systems, define simulation and job parameters and analyze the results and manage execution and data
  • from a single point of access

Key takeway:

  • A value proposition is not something you just make up. It is something that you discover. It starts out as a hypothesis that is then tested by learning from your users what they value or don’t value about your product. And it needs to be reviewed and refined over time.

That concluded our first day. There’s more to say so I’ll write some more about the rest of the week later.

Matrix similarity and eigenvalues

Introduction

Recently I was reading Data Science from Scratch by Joel Grus. One of Grus’ prerequisites for doing data science is linear algebra. I took a linear algebra course as a freshman but I didn’t do particularly well in it and what little I learned I’ve long since forgotten. So I’ve decided to brush up on my linear algebra by working through a textbook called Linear Algebra by Jim Hefferon, a free online text book recommended by Grus.

In the fifth and final chapter of Linear Algebra the author’s program is to come up with a kind of normal form of matrices that would become an equivalence class for matrices. The chapter starts by introducing the notion of matrix similarity. If two matrices are similar then they essentially are the same transformation, just in different bases.

The normal form the chapter builds to is the Jordan normal form. There are matrices that are diagonalizable and for these matrices a diagonal matrix is the normal form (more on this later). But not all matrices are diagonalizable. Some are nilpotent. A nilpotent matrix is one that when applied to itself multiple times eventually becomes the zero matrix. The canonical form of nilpotent matrices is all zeros and some ones on the sub-diagonal. Jordan normal form essentially combines these two canonical forms to create a canonical form that can represent any matrix.

That’s about all I want to say about Jordan normal form. Here I’ll be focusing on diagonalizability and how to compute eigenvalues and eigenvectors.

Diagonalizable matrices

I introduced matrix similarity by saying that similar matrices represent the same transformation just in different bases. More formally, two matrices, A and B are similar if there exists a matrix P such that A = P B P^{-1}. This P matrix is a basis conversion matrix and converts from B’s basis to A’s basis; the inverse converts in the opposite direction.

A diagonal matrix is defined as a matrix that is all 0’s except for non-zero entries on the diagonal. A matrix T is diagonalizable if there is a matrix P such that P T P^{-1} is a diagonal matrix. So the diagonal representation of T is similar to T and any matrix that is similar to T is also similar to the diagonal representation of T. Later we’ll see how to determine if a matrix is diagonalizable.

Let’s think a little bit about what a diagonal matrix represents. Let’s say we have a diagonal matrix T with n rows and columns. T is a matrix with all zeroes except that in the i-th row there is a diagonal entry that we’ll call \lambda_i. The basis of the domain and codomain is the same and the vectors that make up this basis we’ll call \beta_i. Then, for each basis vector:

<br /> T \beta_i = \lambda_i \beta_i<br />

Let’s look at an example. Let T be the following diagonal 2×2 matrix with the natural basis:

<br /> \begin{pmatrix}<br /> 2 & 0 \cr<br /> 0 & 3<br /> \end{pmatrix}<br />

So for T, \lambda_1 = 2, \lambda_2 = 3, \beta_1 = (1, 0) and \beta_2 = (0, 1). We can easily show that T \beta_1 = \lambda_1 \beta_1:

<br /> \begin{pmatrix}<br /> 2 & 0 \cr<br /> 0 & 3<br /> \end{pmatrix}<br /> \begin{pmatrix}<br /> 1 \cr<br /> 0<br /> \end{pmatrix}<br /> =<br /> \begin{pmatrix}<br /> 2 \cr<br /> 0<br /> \end{pmatrix}<br /> =<br /> 2<br /> \begin{pmatrix}<br /> 1 \cr<br /> 0<br /> \end{pmatrix}<br /> =<br /> \lambda_1 \beta_1<br />

And likewise for \lambda_2 and \beta_2.

For a diagonal matrix, we’ll call the \lambda_i the eigenvalues and the \beta_i the eigenvectors. If we can develop a way to compute the eigenvalues and their associated eigenvectors for a matrix then we can construct the diagonal representation of a matrix which would simply have the \lambda_i as the diagonal entries and the basis would be the eigenvectors.

Computing Eigenvalues and Eigenvectors

So how do we find eigenvalues? What we’ll do is solve for T \vec{v} = \lambda \vec{v}. This becomes

<br /> T \vec{v} - \lambda \vec{v} = \vec{0}<br /> \\<br /> (T - \lambda I) \vec{v} = \vec{0}<br />

(T - \lambda I) \vec{v} = \vec{0} is a homogeneous system and the only solution for \vec{v} is \vec{0} if T - \lambda I is nonsingular. (A nonsingular matrix is one that has a unique solution but for a homogeneous system the only unique solution is \vec{0}). That’s not what we want; we want non-zero solutions for \vec{v}. That will only happen if T - \lambda I is singular, which is true when |T-\lambda I| = 0, that is, when the determinant of T - \lambda I is 0.1 This equation, |T-\lambda I| = 0 is called the characteristic equation.

Let’s look at an example. Let T be

<br /> \begin{pmatrix}<br /> 3 && 0 \cr<br /> 8 && -1<br /> \end{pmatrix}<br />

Then

<br /> |T - \lambda I| =<br /> \begin{vmatrix}<br /> 3 - \lambda && 0 \cr<br /> 8 && -1 - \lambda<br /> \end{vmatrix}<br /> = (3 - \lambda) (-1 - \lambda) - 0 * 8 = (3-\lambda)(-1-\lambda)= 0<br />

This equation has two solutions, \lambda = -1, 3. (Note: computing a determinant for a matrix of dimension 2 is relatively simple. To compute a determinant in higher dimensions see Laplace’s expansion.)

For each eigenvalue we can find a corresponding set of eigenvectors. Let’s start with the eigenvalue -1. We want to find vectors for which T\vec{v} = -1\vec{v}.

<br /> \begin{pmatrix}<br /> 3 && 0 \cr<br /> 8 && -1<br /> \end{pmatrix}<br /> \begin{pmatrix}<br /> a \cr<br /> b<br /> \end{pmatrix}<br /> =<br /> -1<br /> \begin{pmatrix}<br /> a \cr<br /> b<br /> \end{pmatrix}<br />

This becomes the following set of equations:

<br /> 3a = -a \\<br /> 8a - b = -b<br />

The only solution for the first equation is a = 0. The second equation then reduces to b=b which has an infinite number of solutions. Therefore the following is the set of eigenvectors for eigenvalue -1.

<br /> \lbrace<br /> \begin{pmatrix}<br /> 0 \cr<br /> b<br /> \end{pmatrix}<br /> |<br /> b \in \mathbb{R}<br /> \}<br />

This last is called an eigenspace. We can use any non-zero vector from this space to be an eigenvector, for example if we want to get a basis vector.

Similarly, plugging in the eigenvalue 3 yields

<br /> 3a = 3a \\<br /> 8a - b = 3b<br />

The first equation has an infinite number of solutions. The second equation reduces to b = 2a. Therefore the following is the set of eigenvectors for the eigenvalue 3.

<br /> \lbrace<br /> \begin{pmatrix}<br /> a \cr<br /> 2a<br /> \end{pmatrix}<br /> |<br /> a \in \mathbb{R}<br /> \}<br />

So the diagonal representation of T is

<br /> \begin{pmatrix}<br /> -1 && 0\cr<br /> 0 && 3<br /> \end{pmatrix}<br />

We can choose two eigenvectors, corresponding in order with the eigenvalues in the above matrix, to make up the basis:

<br /> \langle<br /> \begin{pmatrix}<br /> 0 \cr<br /> 1<br /> \end{pmatrix},<br /> \begin{pmatrix}<br /> 1 \cr<br /> 2<br /> \end{pmatrix}<br /> \rangle<br />

Now that we can compute eigenvalues we can easily determine if a matrix is diagonalizable: a matrix of dimension n is diagonalizable if it has n distinct eigenvalues because we can use those eigenvalues to make up the diagonal entries of the matrix and the eigenvectors associated with them make up the basis. Let P be the matrix whose column vectors are these basis vectors. Then the diagonalization of T is \hat{T}:

<br /> \hat{T} = P^{-1} T P<br />

Conclusion

Now we can not only determine if a matrix is diagonalizable but also produce the diagonal representation of a matrix and its basis. Why would we want to diagonalize a matrix? For a diagonalizable matrix, the diagonal representation is the easiest to work with. One example of how diagonal matrices are easier to work with is matrix exponentiation. If you have a diagonal matrix T with diagonal entries \lambda_i then T^n is a diagonal matrix with diagonal entries \lambda_i^n. Thus T^n is very easy to compute. If on the other hand T isn’t diagonal but it is diagonalizable, and \hat{T} is the diagonal representation of T, then T^n is

<br /> (P\hat{T}P^{-1})(P\hat{T}P^{-1})...(P\hat{T}P^{-1}) = P \hat{T^n} P^{-1}<br />

Since computing \hat{T^n} is very simple, it is worth it to transform into its basis and back again as a way of computing T^n.

In addition to practical applications can we also develop an intuition about eigenvectors from the preceding discussion? Since a linear transformation applied to an eigenvector doesn’t change its direction, the eigenvectors of a linear transformation can be thought as “axes” of the linear transformation. We can see that from how the eigenvectors of a matrix are used to form the basis vectors for the diagonal representation. The associated eigenvalues can be thought of as the amount of “skew” along those axes that the linear transformation imparts to vectors to which it is applied. So in a sense the eigenvectors and eigenvalues describe the action of a linear transformation on vectors to which it is applied, and this description appears to be as succinct as possible.


  1. Why is a matrix singular if the determinant is 0? Recall that singular matrices do not have a unique solution. One way to determine if a matrix is singular is to reduce it via Gaussian elimination to echelon form. If the result does not have leading coefficients for each row then it will not have a unique solution. But if it is missing a leading coefficient in one row then it must have a zero on that diagonal entry. One way to compute a determinant is to reduce a matrix to echelon form and then take the product of the diagonal entries. So the determinant will be 0 only if there is a 0 on the diagonal which happens when there is a missing leading coefficient.