Different Roads Take Me Home: The Nonlinear Relationship Between Distance and Flows During China’s Spring Festival – Humanities and Social Sciences Communications
The burgeoning field of explanatory machine learning carves out a new path for modeling mobility. Traditionally, machine learning has been perceived primarily as a tool for prediction rather than explanation. However, approaches such as Gradient Boosting Decision Trees challenge this notion. These methods can reveal nonlinear relationships among variables through tools like partial dependence plots. Much of this explanatory machine learning, especially approaches like TOD, have been applied within the realm of intra-urban transportation research. Nevertheless, it is time to extend these novel methodologies to explore inter-city mobility.
Building upon these research advancements, we propose a framework aimed at unraveling the complexities of periodical mobility during the Spring Festival (Wang et al., 2016). This framework focuses on the interplay between periodical mobility and interprovincial spatial configurations, resulting in a model that adeptly captures the unique patterns of interprovincial periodical mobility in China. Our approach is particularly suited to decipher the dynamics of periodical mobility and its underlying processes within China’s distinctive provincial landscape. Here, rapid urbanization has accelerated human mobility, resulting in spatially uneven development. Over the four decades following economic reforms, China’s provinces have emerged as a complex patchwork, each characterized by unique disparities (Fang et al., 2020). Understanding these disparities is vital for grasping the nuances of China’s urbanization dynamics, linking mobility patterns to broader socioeconomic transformations.
In this framework, geographical distance is identified as an essential yet nonlinear factor shaping mobility. It introduces varying degrees of cost and uncertainty, significantly influencing migration patterns (Zipf, 1946). These disparities in mobility lead to diversity, further exacerbating spatial divisions between provinces (Cui et al., 2020). The hukou system adds another layer of complexity, triggering seasonal migrations between urban and rural areas, most notably during the Spring Festival .
Our study focuses on the nuanced relationship between geographical distance and mobility, offering insights into urban-rural dynamics and regional development (Amini et al., 2014; Liu et al., 2014). By probing provincial disparities, we illuminate the ongoing separation between employment and settlement in China, driven by periodical flows—crucial for understanding urbanization at a national scale. We introduce three nuanced ‘distance-mobility intensity’ hypotheses to capture these complex interactions:
- The Plateau Hypothesis (A: Plateau): Asserts that within a province, mobility intensity initially increases only marginally with distance, quickly reaching a plateau. This saturation is due to the province’s cohesive socioeconomic structure and integrated infrastructure.
- The Drop Hypothesis (B: Drop): Suggests that mobility intensity declines sharply upon reaching provincial boundaries. This ‘boundary effect’ is due to a mix of administrative, socioeconomic, and cultural discontinuities that impede movement, further reinforced by the hukou system (Chan, 2014; Chen and Fan, 2016).
- The Rebound Hypothesis (C: Rebound): Proposes that mobility intensity revives beyond a certain threshold due to the attraction of more economically developed provinces, which offer superior economic conditions, job opportunities, and public services.
Diverging from conventional models, our approach underscores mobility’s periodic and multifaceted nature, particularly during the Spring Festival migration. We highlight the intricate interplay of inter-provincial distances and migratory flows, integrating socioeconomic and cultural factors into our nonlinear assessments. This expands the understanding of distance decay typologies, enhancing spatial trend analyses, and informing strategies for optimizing metropolitan areas and urban conglomerates.
In this research, we utilized the Tencent Migration Dataset to dissect complex population migration patterns. Originating from Tencent’s extensive software ecosystem, this dataset captures inter-city movements by analyzing smart device geolocation data. Its credibility and reliability are reinforced by extensive scholarly usage focused on urban connectivity and mobility. The Tencent Location Big Data Platform, with its precise and comprehensive migration data, forms the backbone of our analysis.
Our team meticulously gathered data leading up to the 2018 Spring Festival from February 1st to 14th, encompassing a national and international scope. This process yielded approximately 40,289 daily records detailing population flows’ origins, destinations, volumes, and timings. After a rigorous data cleaning and aggregation process, we refined the dataset to 20,155 records, representing prefectural-level city population flows during the festival, central to analyzing urban connectivity strength.
Our research employed variables influenced by gravity models. The dependent variable, indicating the strength of inter-city connectivity (Flow), was derived from Tencent’s migration data, relevant due to its wide user base (Zhang et al., 2020). Key independent variables included the geometric distance between city centers (Distance) alongside control variables such as city population size, economic output (GDP), public service levels, environmental quality, population density, and industrial structure. These variables were sourced from the ‘China City Statistical Yearbook’ to ensure a comprehensive analysis.
We explored urban traffic dynamics using decision tree-based ensemble learning methods, with Gradient Boosting Machine (GBM) as our primary tool, following Friedman’s (2001) innovative approach. Choosing GBM over traditional linear regression models was due to its exceptional handling of nonlinear relationships and robustness with large, complex datasets. GBM integrates decision trees with iterative optimization, enhancing accuracy and performance across iterations (Yang et al., 2024; Leech et al., 2023). This captures intricate data patterns that often elude linear models (Sprangers et al., 2021; Leech et al., 2023). Biased dependency analysis graphs provide a deeper understanding of independent variables’ roles, while partial dependence graphs offer insights into variable interactions.
Initially, we conducted logarithmic transformations during data preprocessing to mitigate skewness. We then employed the Gradient Boosting Machine (GBM) model, enhancing predictive precision through its use of pseudo-residuals (Bühlmann and Hothorn, 2007). Hyperparameter tuning was performed using RandomizedSearchCV, a method optimized for exploring the stochastic hyperparameter space efficiently (Friedman, 2001).
Our study rigorously validated the three hypotheses on urban mobility intensity and distance using the GBM model and partial dependence plots. GBM provides a nuanced interpretation of complex relationships, making it suitable for these hypotheses (Pancerasa et al., 2019; Flowerdew, 2010). These include the Plateau Hypothesis, suggesting stability in mobility intensity across short to medium distances; the Drop Hypothesis, indicating a decline at provincial borders; and the Rebound Hypothesis, proposing increased intensity around economically advanced provinces.
By combining these insights with GBM’s advanced capabilities and the interpretive strength of partial dependence plots, our methodological approach sets a comprehensive framework for understanding urban mobility dynamics influenced by geographical distance. This approach, diverging from traditional linear models, facilitates a more nuanced analysis, highlighting distance as a fundamental factor in shaping urban mobility patterns. Enhanced analytical capabilities offer deeper insights into the multifaceted interplay between distance and urban mobility, revealing intricate relationships at the heart of these critical urban dynamics.