© Randall Monroe, 2022
Pulling any number of any number by repeatedly cramming one into a bunch of units and rounding until it starts looking like the other.
Try it out at https://rounding.lam.io!
This is a fairly straightforward use of the Wikipedia Convert module's database of units, which I reused from a previous project (phrase2unit: Implementing XKCD 2312). All this is doing is a greedy linear search of all units that match the input kind, which is sparse enough (<100k entries) that this isn't very expensive. In fact sparsity is a big problem with the original dataset, which doesn't fill the real line enough to converge for most pairs of numbers and units. Since units can be arbitrarily combined, the live implementation uses a cross product of all units in the base Wikipedia database (and their reciprocals), which gives decent performance and coverage at the expense of some very weird-looking but valid units.
Most of the heavy lifting is done by Postgres, specifically this linear search of all valid unit conversions post-rounding:
SELECT * FROM (
SELECT *, ABS(LOG(st0.vto / <target quantity>)) AS diff FROM (
SELECT long_name AS vto_unit_long,
name AS vto_unit,
factor,
ROUND(<starting quantity> / factor) * factor AS vto
FROM units
WHERE pool = %s AND factor > 0
AND si_m=%s AND si_s=%s AND ...
) st0 WHERE st0.vto > 0
) st1 ORDER BY st1.diff ASC LIMIT 1;
which is really as brute force as it looks, a sequential scan without any indexing after cutting the list to the units that match the queried unit. Looking at the number of matching units in the worst case I had alluded that it stays <100k entries, so this is pretty tractable.
The base units that Wikipedia provides have an interesting distribution, with energy having the most diverse offerings:
Unit type | Count |
---|---|
Energy | 205 |
Volume | 143 |
Length | 90 |
Area | 60 |
Mass | 58 |
Flow (volume/s) | 54 |
Force | 50 |
Time | 44 |
Pressure/Energy per unit volume | 36 |
Density | 34 |
Speed | 34 |
Per unit area | 31 |
Molar rate | 27 |
Power | 27 |
Mass per unit area | 16 |
Unitless | 15 |
Linear density | 15 |
Absorbed radiation dose (energy/area) | 15 |
Temperature | 14 |
Power per unit mass | 10 |
Acceleration | 9 |
Magnetic field strength | 6 |
Chemical amount | 6 |
Energy per chemical amount | 3 |
Per unit time | 3 |
Per unit volume | 3 |
Charge | 3 |
Mass per unit power | 2 |
Mass per unit time | 2 |
Pressure per unit distance | 2 |
Force per unit distance | 2 |
Voltage | 1 |
Electrical current | 1 |
Luminous intensity | 1 |
The combined units have themselves a different mix of units of course, although it is fortunate that many of the ones near the top are named, common units that people like to try:
Unit | Unit type | Count |
---|---|---|
Unitless | 97268 | |
m1 | Length | 38232 |
m-1·s-2·kg1 | Pressure/Energy per unit volume | 38131 |
m-5·s2·kg-1 | ? | 29930 |
s-2·kg1 | Force per unit distance | 27111 |
m-4·s2·kg-1 | ? | 25955 |
m1·s-2·kg1 | Force | 25761 |
m2 | Area | 25636 |
m3 | Volume | 24836 |
m3·s-2·kg1 | ? | 23400 |
m-4·s4·kg-2 | ? | 20882 |
s1 | Time | 18702 |
m2·s-2·kg1 | Energy | 18597 |
m4 | ? | 17645 |
m2·s-2 | Absorbed radiation dose (energy/area) | 16343 |
m-5·s3·kg-1 | ? | 15012 |
m2·s-3·kg1 | Power | 14360 |
m1·s1·kg-1 | ? | 14106 |
Further optimizations are difficult, especially within the constraints of an RDBMS. Since where the 0.5 cliffs land depends so wildly on the input, it's pretty much impossible to make any pre-built indexing structure that will serve all inputs. It is possible to progressively search for ranges (e.g. to increment, to look for numbers that will generate 0.5-1, then 1.5-2, etc.) although finding numbers that land in lower ranges won't necessarily be the fastest to advance since landing at say 0.99 has much smaller gains than landing at 20.5. If there were much larger inputs and search spaces it might be worth doing this progressive search and finding a reasonably threshold but for now with this set of units brute force is actually not bad and has minimal overhead.