Our project centered around find a solution for the problem of creating a method of identifying viable classifiers for gentrification. In constructing our project our first obstacle was to avoid the problem of incorrectly assuming that specific socioeconomic trends such as a rise in average housing prices or an increase in average income across a span of time in a specific county was inherently correlated with the presence of gentrification in those areas. We decided through an extensive process of studying prior research done on similar topics dealing with gentrification that we needed to find an area size that was granular enough so that we could minimize the presence of diversified factors causing the socioeconomic changes.

Eventually we settled on finding data from all of the census tracts of a number of counties in states across the United States. Our plan was to gather data from the US Census bureau (from 2000 and 2010) for those areas and look at the all features and covariates that may be potentially relevant. We used a dimensionality reduction/feature extraction method known as Principal Component Analysis (PCA) to reduce the our feature set to only the “k” – most relevant variables and giving them high priority.

Using the selected features, we were able to build a Logistic Regression model with the labels we previously defined, and verify relationships with a +/- percent change in said labels between 2000 and 2010. Using the average percent change for each feature, we were able to label each feature in our dataset as above or below that state average in order to use binary features to then ultimately predict if an area is highly gentrified or not. The labels that we picked to predict percent change in were: Percent change of renter occupied units, population density, and minority (black and hispanic) population.

Looking at these predictions gives us a reliable method to simplify the rather broad and complex concept of gentrification by assigning quantitative predictors thus allowing us to measure the degree of change across a large number of locations, and use the results from the aforementioned measurements as potential indicators for gentrification in the United States.

https://github.com/rohsuresh/datax-project