MathJAX

Thursday 6 July 2017

World Population Distribution by Latitude & Longitude

After seeing the excellent stuff on this topic from Radical Cartography and Datagraver, I wanted to have my own version.

I started with the source data set from SEDAC/CIESIN (requires registration) which is used by both the sites named above. I used the 2015 data set, which the Datagraver page also uses. The data comes in a GeoTIFF file which contains the population count 'pixel' for every latitude/longitude point for every 30 arc seconds (½ arc minutes or 1/120 degree) of latitude & longitude. Since the Earth's circumference is just over 40,000 km, this means that at a latitude λ this 'pixel' is a rectangle (a trapezium if we want to be pedantic, but never mind) that is 926 meters long and 926 cos λ meters wide: at the equator it is a 926m × 926m square. Actually, the data only covers latitudes from 85°N to 60°S which leaves out Antarctica, and it also marks the interior of Greenland as 'no data', but we can live with that!

This turned out to be a big exercise. For starters, the GeoTIFF file is over 400 MB in size. I used QGIS to extract the data from the TIFF file into a 3-column 'longitude, latitude, value' text file:

  1. Start QGIS
  2. 'Project' ▶ 'New'
  3. 'Layer' ▶ 'Add Layer' ▶ 'Add Raster Layer...'
  4. Select the GeoTIFF file and 'Open'
  5. Wait for file to load (may take a while)
  6. 'Raster' ▶ 'Conversion' ▶ 'Translate (Convert Format)...'
  7. Select output file type = 'ASCII Gridded XYZ'
  8. Specify output file, then 'OK'
  9. Wait ... a long time

This resulted in a file which was over 40 GB in size: not a surprise considering that it had (80 + 65) × 120 × 360 × 120 = 751,680,000 data points in it. I found that 'no data' is indicated by the special value -407649103380480, and running a 'fgrep -v' to remove those points gave me a file that was still over 11 GB (with 212,565,537 data points.)

Finally, after running this through a jury-rigged C-program to get the data down to a granularity of 1° instead of 30", I arrived at a 650 kB text file. This I imported into Excel, and got these graphs.

The graph drawn in red is the population distribution by longitude. The one in blue is the distribution by latitude, while the one in green is the distribution by 'folded latitude' which is the absolute value of the latitude or simply the distance from equator.

For anyone who needs the raw data (in 1° intervals), here is the spreadsheet. The raw data is in D3:MZ183, the rest are subtotals etc.

Some statistics:

  • The mean latitude is 22.7°N, while the median latitude—the 'population equator'—is 25.2°N. The highest mode is at 26°N, followed by 23°N.
  • Going by the distance from equator, the mean is at 26.2°. The median is at 25.8°, which means half the population lives within this distance of the equator. The highest mode is at 23°, followed by 26°.
  • 80% of the population lives within 36.9° of the equator, 90% within 43.5°, 95% within 49.6° and 99% within 55.6°. Here is a graph of the cumulative population distribution by distance from the equator:
  • The longitude with the highest population is 77°E, on which Delhi & Bangalore lie. This is followed by 121°E, on which Shanghai & Manila lie.
  • The southern hemisphere is home to only about 13% of the world's population.

Here is my attempt at plotting the population data in a logarithmic scale 'heat-map' for the 1° grid. I've used violet for less than 100, light blue for 100 to 1,000, darker blue for 1,000 to 10,000, green for 10,000-100,000, yellow for 100,000-1,000,000, orange for 1,000,000-10,000,000 and red for more than 10,000,000.