The census dataset that I am using is http://www.stats.govt.nz/Census/2013-census/data-tables/electorate-tables.aspx
(shout out to NZ Herald Data Editor Harkanwal
Singh), which conveniently provides numbers at the electorate level. I can
also recommend the data files produced from that dataset by Jonathan Marshall which is available
on Github which is
much nicer to work with. The vote numbers come from electionresults.org.nz,
pulled and processed with some code kindly provided by Chuan-Zheng Lee.

After rejecting some of the less
interesting variables provided in the census data (mostly about employment), I
was left with only 1815 variables to check. Yes, that’s still a lot of
variables. When I initially set the analysis to return correlations that were
statistically significant at the 5% level of significance, it returned about 12,000
correlations. After talking to Chuan-Zheng I realised that I was dumb and
forgot that I was actually working with the entire population, where
“statistically significant” no longer makes sense because we’re not working
with samples. So I got rid of statistical significance in terms of individual
correlations entirely.

A straight bivariate correlation
analysis would return a lot of misleading correlations, because in general, if
there are more people in an electorate, there are also more people voting, and
more people in any demographic category. To counter this I followed some internet
advice and used an equation from Steiger (1980) to determine if there was a
statistically significant difference between the two correlations:

- r

- r

To help ensure that the claims made were strong and unlikely to be explained by the variation in electorate populations, I set the analysis to only return correlations where the difference between r

- r

_{12 }the number of people in the electorate vs the number of votes for a particular party- r

_{13}the number of votes for a particular party vs the number of people in a particular demographic groupTo help ensure that the claims made were strong and unlikely to be explained by the variation in electorate populations, I set the analysis to only return correlations where the difference between r

_{12}and r_{13}was statistically significant at the 0.1% level.
Additionally, any correlations that
had an r between -0.1 and 0.1 were removed and analysed separately, as they are
so close to 0 that the relationship is likely that there is no relationship
between the two variables (which may be statistically significant but not all
that interesting for most of what we’re looking at here).

I should probably note somewhere (and
here is as good a place as any) that the sample size in most cases was 71 (all
the general electorates + Maori electorates), except for the immigrant data
which was not available for the Maori electorates (and thus the sample size was
reduced to 64).

Where I’ve used r≈ instead of r=, it’s
because I’ve actually combined a couple of correlations for ease of
communication. For example, “people earning $70,001 or more” is actually
“people earning $70,001-$100,000, people earning $100,000-$150,000, and people
earning $150,001 or more”, but I didn’t want to manually group that data
because hey, I got hungry and needed time to make dinner. It’s an approximation
of the strength of relationship at least, and I guess is intended to be more directional than accurate magnitudinally (magnitude-wise? in terms of magnitude?).

Everything was done in Python (without the use of NumPy or SciPy because as it turns out I would rather spend a few hours torturing myself trying to figure out how to implement the algorithms from scratch than spend a few minutes installing some commonly used modules). In retrospect I should have just pulled out R. Fun (questionable) fact: the number of R User Group meetings per month worldwide is (on average) increasing at a rate of 0.6 meetings per month since November 2008.

Everything was done in Python (without the use of NumPy or SciPy because as it turns out I would rather spend a few hours torturing myself trying to figure out how to implement the algorithms from scratch than spend a few minutes installing some commonly used modules). In retrospect I should have just pulled out R. Fun (questionable) fact: the number of R User Group meetings per month worldwide is (on average) increasing at a rate of 0.6 meetings per month since November 2008.

## No comments:

## Post a Comment