Bee Tracking: A List of Improvements needed…

There were several issues that became apparent during and after the 2021 Summer experiments! I’ve listed some of these issues and the planned solution. This isn’t a very exciting post, skip down for the last few posts which have 3d videos!

Detectable Posts: Although fairly easy to detect, sometimes non-posts get detected. I think a few more stripes to make finding them easier (at the cost of a slight reduction in the maximum distance). E.g. go from 11 stripes to 15.


Improved pot design: It turns out we can tag the bees in-field with modified marking pots, but need to add lids to stop them escaping!


Tag contamination: The wax on the retroreflectors degrades them: Try using alternative materials.


Firing rate: This is a big challenge. There are several potential bottlenecks. I found the flash thermal-protection circuits kicked in. I did experiment with only firing half of the flashes each time – I’ve not looked at the results of this yet. Might be able to disable the protection circuit! The other problem was I found many failed images, due, probably, to buffers failing to be serviced quickly enough etc. Will need to experiment (might be able to downscale image 2×2). A rate of 4Hz is currently possible for 10 seconds.


Tag shape / detectability – related to the above issue is that the tags aren’t always visible, possibly because of their shape. Redesigning to make them visible from all directions (e.g. a cone, cylinder or sphere) could help.


Timing alignment – a major issue I found was that the raspberry pi generated time stamps often ‘jumped’ or ‘froze’, given the importance of precise timing, this is causing a major problem. The solution I’m building is a (GPS-time-synced) ‘clock’ that simply displays the time to the nearest 4ms in binary, in a cylinder. LED driver, LED, GPS receiver, arduino [e.g.].

PCB for one part of the ‘binary clock column’. This will give the GPS synchronised time to 1ms accuracy in LEDs.


Range – should keep looking at how to improve the range – switching to glass corner cube reflectors? Focused flash? If frequency is increased could trade it for a tighter ‘high gain’ beam?

First Learning Flight Reconstructed

I applied the Gaussian process approach to the first learning flight recorded from the new nest. Unfortunately two of the four cameras weren’t very useful (one pointed the wrong way! and the other had a battery failure), so the path is more uncertain than I would like.

Projected path reconstruction onto four camera images. Green line = posterior mean. Solid = less than 30cm standard deviation. Dashed = less than 60cm standard deviation. Missing = greater uncertainty. Yellow dots = locations (with times, in seconds) of the observations for that camera. The green numbers are the times along the trajectory.
Same data as above, but zoomed out to the entire image size. Removed observations to reduce clutter. Higher resolution version.


(First ten seconds of the trajectory. Small moving dots are samples from the posterior distribution to give an idea of uncertainty. Similarly the shade of the line indicates uncertainty with white/hidden = a standard deviation of more than 50cm in the prediction). Big dots are the posts and nest. The triangles are the camera locations.

Next I’ll look at other learning flights and start doing some cross validation on the predicted flight path trajectory.

.

Simulated 3d flight path reconstruction

Last week I was using a particle filter to model the bees, but this morning I tried out (with very simple simulated dataset) using a multioutput Gaussian process. I think these results look better*:

The dots are samples to show uncertainty, set to 1/3 of the true standard deviation for clarity.

The straight lines are from the ‘photographs’ taken from two cameras. The green helix is the ‘true’ path of the simulated bee.

I can’t believe how quick it is to use stochastic variational inference! The slowest bit was having to write some code to do batch-compatible cross-products in tensor flow. Notebook here.
I’ve already run the particle smoother on some learning flights, but will try this out on them instead…

Reconstructing Flight Paths

Progress so far:

1) Automatic detection of the markers.
To make life easier in the long run I made posts that are hopefully visible from up to 50m away using the tracking system’s optics and are machine readable (see here). The tool finds the location/orientation of the markers in the image. It then uses the top and bottom of each to help with alignment.

image.png
Finding the posts automatically. Numbers are the id of the post and a % confidence.

2) Automatic registration of the cameras and markers. This was quite hard.
a) I’ve provided the distance between lots of pairs of landmarks and cameras (measured on the day), and approximate x,y coordinates for all of them. The tool then does a rough optimisation step to work out roughly where everything is (i.e. the x,y coordinates of all the objects are adjusted to make the measured distances match):

image.png

b) Once all the landmarks are located in an image from each of the cameras (see step 1), the orientation (yaw, pitch, roll), fov and position of the cameras and the markers is optimised to make their 3d locations ‘match’ the photos.

image.png
The blue crosses mark the ‘rendered’ locations of the objects from their 3d estimated positions. The yellow crosses are the identified locations in the images. Note, ignore the ‘nest’ marker – the ‘nestboxleft/right’ markers on the right of the image show the location of the nest

3) Detect Bee. We then run the standard tag detection algorithm for each camera, which produces image coordinates of possible detected bees. [edit: we then manually confirm each detection].

4) Compute 3d path. To get the 3d path, I’m using a particle filter/smoother [edit: later we use a Gaussian process, but the images here are from the PF] to model the path of the bee. This gives an estimated x,y,z and a distribution over space (which gives us an idea of its confidence). Using the camera configurations determined in step 2, and the coordinates from step 3: Each coordinate in step 3 equates to a line the bee might lie on in 3d space… I’ll skip the details now… the upshot is you end up with a trajectory in 3d. I’ve rendered the trajectory back onto the 4 camera images.

image.png
Camera 1: The blue line is the trajectory (I’ve not smoothed between observation times yet) and the yellow circles indicate one standard error (i.e. gives us an idea how confident the model is to the location of the bee). The numbers are seconds

image.png
Camera 2: The blue line is the trajectory (I’ve not smoothed between observation times yet) and the yellow circles indicate one standard error (i.e. gives us an idea how confident the model is to the location of the bee). The numbers are seconds.

image.png
Camera 3: The blue line is the trajectory (I’ve not smoothed between observation times yet) and the yellow circles indicate one standard error (i.e. gives us an idea how confident the model is to the location of the bee). The numbers are seconds.

image.png
Camera 4: The blue line is the trajectory (I’ve not smoothed between observation times yet) and the yellow circles indicate one standard error (i.e. gives us an idea how confident the model is to the location of the bee). The numbers are seconds.

The trajectories above are 3d, projected back onto the photos.

The nest had been moved a few hours earlier, so we wouldn’t necessarily expect any learning flights really.

Looking at the photos carefully, I think the bee heads left along the hedge (in the images below (2d detection) the blue crosses are where the system thinks it’s found a bee, the rest are just debugging info – i.e. where it’s looked). The smoother had low confidence about where the bee was after ~7 seconds. If I went through and manually clicked on the ‘correct’ targets it would be able to reconstruct more of the path. Note the bee at the top of the hedge.

image.png
image.png
image.png

I’ve just run it on one flight so far (from the 20th at 13:58:00)!

After the first day at the field site, I got a bit nervous and moved the cameras a bit closer to the nest as I was worried the system wouldn’t see the bee. I think the bee is detected quite well, in retrospect, but the problem now is the cameras are slightly too close really! I should have had more faith in the system!

Bee Tracking in Devon

Last week I visited Exeter University’s bumblebee field site, where we set up four tracking systems around a new hive.

The first issue we found was the bees that were tagged in the lab weren’t visible in the retroreflector images. We suspect due to wax getting on the tags. I’ll be investigating different retroreflector tag materials to see which is more immune to the effect of wax (either less gets on the tag, or it works even with some deposited). My hunch is that corner-cube plastic coated reflectors will do better – both because the wax might stick less but also because they don’t depend on the refractive index change between the surface of glass beads and air.

I found I could tag the bees with new tags in the field using my custom marking pots. These have wider gaps, which makes it very possible for the bee to escape. The next version will have a second mesh on top that means the bee is trapped until the field worker is ready (and the bee is immobilised by being held against the mesh by the sponge).

I suspect that the tag signal is unreliable simply because the ridge/tag is only visible from the side or above, so need to look at alternative tag shapes (rather than just a single ridge).

Next post: The analysis!…

Making HMC mix better

tldr; for MCMC for VI use the posterior covariance of q(u) from an initial standard-VI run to whiten the latent variables, rather than use the prior covariance?

I’m new to both Variational Inference and MCMC/HMC but I’ve already experienced the well known issues associated with each.

  • Variational Inference is well known for being poor at properly quantifying the uncertainty,

we do know that variational inference generally underestimates the variance of the posterior density; this is a consequence of its objective function. But, depending on the task at hand, underestimating the variance may be acceptable [1]

  • HMC/MCMC often struggles with Gaussian process models,

The common approach to inference for latent Gaussian models is Markov chain Monte Carlo (MCMC) sampling. It is well known, however, that MCMC methods tend to exhibit poor performance when applied to such models. Various factors explain this. First, the components of the latent field x are strongly dependent on each other. Second, \theta and x are also strongly dependent, especially when n is large. [2]

One solution is to “whiten” the latent variables for the HMC decorrelating them. This is described in [3] and was also suggested to me in this github issue. In both the prior covariance matrix is used to initially transform the latent variables.

In my application however I’ve found that isn’t sufficient. My application is calibrating lots of [low cost] air pollution sensors scattered across a city (one or two sensors are ‘reference’ and are assumed to give correct measurements). Some sensors can move & it’s assumed when sensors are colocated they’re measuring the same thing. In my model each sensor has a transformation function (parameterised by a series of correlated random variables that have GP priors, over time) this function describes the scaling etc that the cheap sensors do to the real data. We assume these can drift over time. For now imagine this is just a ‘calibration scaling’ each sensor needs to apply to the raw signal to get the ‘true’ pollution, at a given time.

A-priori the scaling for each sensor is assumed to be independent, but once two sensors (A & B) have been in proximity their (posterior) scalings are very very correlated. If B is colocated with sensor C and C with D, etc, it will lead to very strong correlations in these scaling variables across the network. Importantly these correlations are induced by the observations and are only in the posterior, not the prior. The upshot is that the whitening provided by the prior covariance doesn’t help decorrelate the latent variables much, and I found the HMC really struggled to mix.

The Idea
I was thinking an obvious solution is to run ‘normal’ Variational Inference first, using a standard multivariate Gaussian for the approximating distribution, q(u). Then, once it’s (very) roughly optimised, use the covariance from q(u) instead of the covariance from the prior for whitening. This way we hopefully will have the benefit of a complicated distribution provided by HMC but it can still sample/mix successfully.

[1] Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. “Variational inference: A review for statisticians.” Journal of the American statistical Association 112.518 (2017): 859-877. link

[2] Rue, Håvard, Sara Martino, and Nicolas Chopin. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the royal statistical society: Series b (statistical methodology) 71.2 (2009): 319-392. link

[3] Hensman, James, et al. “MCMC for variationally sparse Gaussian processes.” Advances in Neural Information Processing Systems. 2015. link

EM interference and xenon flash bulbs

I’ve started looking at bumblebee flight paths (especially during initial orientation flights from the nest). To this end I’ve wanted to increase the acquisition frequency of the system.

I found that firing more frequently than about every 1.6s seconds lead to marked decreases in the actual brightness of the flash. Instead I’ve rebuilt the electronics to allow the flashes to be fired sequentially. I have previously done this but found often the flashes would end up firing together. I assumed my circuit had a short, but I found the same happened again in my new circuit. I experimented for a while adding various capacitors across the supply rail (thinking maybe there was a ‘spike’ coming back through the two trigger wires from the flash), with the same thinking I added a reversed diode across those lines to catch any such spike. None of this helped.

I eventually found that spreading the flashes out across my desk helped a lot… this wasn’t expected, and suggested the problem was likely to be electromagnetic interference. The flash going off involves considerable current*, which even given the small size of the flash probably will equate to quite a large whole-spectrum EM pulse.

After chatting with helpful people on freenode’s ##electronics, I tried out tin-foil, and found (as long as each flash & wire was wrapped in its own foil & not touching) the 4 flashes would fire independently.

Quick foil experiment!

I’ll try out using some household build thermal insulation tape to shield the flashes next as that will be a little more tidy. I also found direct sunlight got the black casings quite hot so they will probably benefit from the silver foil for that reason too.

*back of the envelope calculation: If the flash (at 1/32) is abour 3J over 10us (=300kW) with say, 300V across it, then the current will be on the order of 300kW/300=1kA.

Remove first digit from Swiss Coordinates

Quick tip. I’m using data from air pollution sensors provided here. The Koordinaten are of the form 2’710’500 / 1’259’810. These use the Swiss coordinate system. pyproj is the defacto standard for doing coordinate processing, this page also helped.

But I could make it work, those coordinate values were too big.

Eventually I realised one needs to remove the leading 1 and 2 from the two numbers. This is mentioned in the wikipedia article:

In order to nonetheless achieve a clear distinction between the two systems, an additional digit was added to the coordinates of LV95: any East coordinate (E) now starts with a 2, and any North coordinate (N) with a 1. Consequently, LV95 coordinates are given by pairs of 7-digit numbers, whereas LV03 used pairs of 6-digit numbers – for instance the coordinates (2 600 000m E / 1 200 000m N) in LV95 would be expressed as (600 000m E / 200 000m N) in LV03.

https://en.wikipedia.org/wiki/Swiss_coordinate_system

In summary, my code now looks like:

from pyproj import Proj, transform
sites = []
sites.append({'name':'Zurich, Schimmelstrasse (ZH)','E':681942,'N':247245,'height':415})
sites.append({'name':'Zurich, Heubeeribüel (ZH)','E':685126,'N':248460,'height':610})
pWorld = Proj(init='epsg:4326')
pCH = Proj(init='epsg:21781')
for site in sites:
    print(transform(pCH,pWorld, site['E'], site['N']))

Tensorflow and Matrices containing Variables

Recently Pablo, Dennis and I were wondering what the best way to build Tensors with variables inside. I’ve found three ways (that largely mirror the numpy equivalents). Basically just different combinations of stacking, concatting, reshaping and gathering. [related SO question]

import tensorflow as tf
import numpy as np

a = tf.Variable(1.0,dtype=np.float32)
b = tf.Variable(2.0,dtype=np.float32)
with tf.GradientTape() as t:
    #these lines are equivalent:
    M = tf.reshape(tf.gather([a**2,b**2,a**2/2,1],[0,2,3,1]),[2,2])
    M = tf.reshape(tf.stack([a**2,a**2/2,1,b**2]),[2,2])
    M = tf.concat([tf.stack([[a**2,a**2/2]]),tf.stack([[1,b**2]])],0)
    gradients = t.gradient(tf.linalg.det(M),[a,b])
    print(gradients)
[<tf.Tensor: shape=(), dtype=float32, numpy=7.000001>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0000005>]

I thought I’d just add that, one (possibly unwise) default behaviour of the gradient method is, if one were to ask for the derivative of a matrix it will return the derivative of the reduce_sum of the matrix:

with tf.GradientTape() as t:
    M = tf.concat([tf.stack([[a**2,a**2/2]]),tf.stack([[1,b**2]])],0)
    gradients = t.gradient(M,[a,b])
    print(gradients)
[<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>]

Which one can see is returning the derivative of the sum of M.