Summary: The distribution and composition of visual elements like sidewalks, greenery, building facades, cars, and pedestrians captured in the street view images had high predictive power for many urban attributes like transportation use, poverty, crime rates, and physical activity levels.
The computer vision algorithms were able to find and quantify these relationships from the image data alone, and in many aspects outperformed more conventional means of gathering such data.
Cities are complex systems, but their physical attributes offer clues to the lives within.
A new study, entitled “Urban visual intelligence: Uncovering hidden city profiles with street view images” and published on June 23rd in Proceedings of the National Academy of Sciences, demonstrates how deep learning algorithms can analyze street view images to uncover hidden socioeconomic profiles of neighborhoods across the United States.
Researchers from the University of Hong Kong, Hong Kong University of Science and Technology, Jiangxi Normal University, and MIT collected over 27 million Google Street View images from 80 U.S. counties in 7 major metropolitan areas.
Using a computer vision model, they extracted urban features from these images such as trees, sidewalks, building facades, and cars.
The metropolitan areas included in the study were Miami–Fort Lauderdale–Pompano Beach; Los Angeles–Long Beach–Anaheim; Chicago–Naperville–Elgin; Philadelphia–Camden–Wilmington; New York–Newark–Jersey City; Boston–Cambridge–Newton; and San Francisco–Oakland–Berkeley.
This selection of metro areas covers a diverse range of geographical contexts and population sizes – from major coastal cities like New York to inland metros like Chicago.
The researchers chose these particular regions to capture street view data representing the variety of urban environments across the United States.
The spatial distribution of these features alone accounted for up to 83% of the variance in vehicle miles traveled, 64% in violent crimes, and 68% in physical inactivity.
The distribution of these visual elements captured by the computer vision algorithms correlated strongly with official crime and economic data.
And the image models often outperformed models that used more conventional demographic and population data.
For example, when predicting poverty rates, the models using the street view images could account for 62% of the variance, whereas models that rely on population and demographic data only capture about 56% of the variance.
What High Crime Rates Looks Like
The researchers found that areas with higher rates of violent crime had more featureless building facades and fewer windows visible from the street.
In other words, less “visual permeability” is associated with higher crime.
In contrast, areas with buildings containing more windows and transparency at street level, allowing for natural surveillance, tended to be wealthier.
Disorderly street environments with poor maintenance, lack of investment, and fewer pedestrian amenities were predictive of neighborhoods with higher levels of crime and economic disadvantage.
In contrast, crosswalks, small block sizes, and pedestrian signage were related to less crime.
Unsurprisingly, deteriorating building facades and signs of neglect (for example peeling paint, broken windows, etc.) predicted higher poverty rates.
And the presence of vacant, abandoned, or dilapidated buildings visible in street view was indicative of increased poverty and crime levels.
Likewise, more graffiti and trash on the streets were linked to increased crime and poverty levels.
And a lack of street lighting, benches, bus shelters, and other street furniture was associated with higher rates of crime and poverty.
Areas with poorly maintained sidewalks or lack of pedestrian accessibility features had higher poverty and crime rates.
And less greenery and trees in the images also correlated with higher rates of crime and poverty.
“We propose ‘urban visual intelligence’ as a process to uncover hidden city profiles, infer, and synthesize urban information with computer vision and street view images,” the paper’s authors explained.
A Better Way of Collecting Urban Data
Analyzing urban environments through street view images offers several notable advantages over traditional data collection methods.
Computer vision applied to street view scenes could supplement traditional urban data sources by capturing subjective, experiential attributes of place that are hard to quantify otherwise, yet still influence neighborhood life and perceptions.
The technology allows for scaling of these subjective analyses previously only feasible through in-person observation.
The visual characteristics extracted from the images provide an objective, scalable way to measure attributes of the built environment, whereas conventional measures like land use surveys and in-person audits are more time-consuming, labor-intensive, and limited in scope compared to automated computer vision analysis of street scenes.
With expanding image datasets from sources like Google Street View, this approach enables studying urban areas at a much larger scale and finer spatial resolution than previously feasible.
The ability to discern visual patterns also captures implicit details and intangible qualities not fully conveyed by statistical datasets or surveys.
Additionally, because images can repeatedly be captured over a long period of time, computer vision techniques offer the potential to assess changes over time more readily than sporadic field studies.
By supplementing traditional indicators with automated visual intelligence, researchers and planners can gain a more comprehensive, nuanced understanding of socioeconomic trends and relationships to the built environment.
This novel method promises to unlock new urban insights at reduced cost and effort compared to existing methods.
Urban Visual Intelligence: Next Steps
“With increasing available computer vision tools and urban data, researchers can further extract semantic meanings from the images and videos of cities,” the authors wrote. “These tools and data allow urban studies to capture large-scale microvariations in cities, synthesize hidden information in cities, and infer future trends.”
Analyzing images from these 7 metropolitan areas provided a robust dataset to test how well the computer vision algorithms could uncover socioeconomic characteristics from the visual attributes of different cities and neighborhoods.
The consistent findings across the different regions also demonstrate the potential generalizability of using street view images and computer vision to understand urban lives.
The study provides a foundation for exploiting street view images and computer vision to understand cities.
As the researchers summarized, “the look of the urban environment is demonstrated here to be highly connected with the well-being of a city.”
Urban planners can adopt early interventions based on visual cues rather than waiting for extensive survey results.