Thematic cultural heritage tourism trail planning integrating multi-source data and machine learning in Wuhan China
Methodological framework
The main methodological framework of this study is illustrated in Fig. 1 below.

This figure illustrates the main methodological framework of this study.
The research framework comprises four sequential stages: (1) Quantification of Indicator Research on Cultural Heritage Sites as input; (2) many-objective algorithmic optimization; (3) route generation, and (4) visualization and output. To ensure the proposed planning method functions effectively in real-time scenarios—and that its outcomes align with both cultural conservation goals and visitor experience enhancement—it is essential to first establish a systematic evaluation framework. This requires identifying a set of relevant indicators that can quantitatively and qualitatively capture the multidimensional attributes of heritage tourism routes. By doing so, the model can translate abstract planning objectives into measurable parameters that support algorithmic optimization. The construction of this indicator system thus serves as a critical foundation for integrating user preferences with spatial and cultural data, enabling personalized and context-sensitive route planning in historic urban environments.
Selection of relevant indicators
Following the definition of the overall planning methodology, the next step involves defining the relevant indicators for evaluating tourist trails, which serve as the basis for assessing the cultural heritage value and fulfilling tourists’ needs.
In terms of Subjective Indicators, empirical evidence confirms that tourist preference orientation constitutes the cornerstone of thematic route personalization, where multi-dimensional preference profiling enables precise demand-response planning29,30. Building on this theoretical foundation, this study categorizes travel preferences of distinct user groups into six dimensions: Thematic, Diversity, Convenience, Comfort, Rapidity, and Economy, as illustrated in Fig. 2.

Subjective weight assignment.
Specifically, Thematicity emphasizes the design of tourism routes centered on particular themes (e.g., history, religion, culture) to enhance tourists’ immersive experience; Diversity focuses on providing rich experiences through popular and abundant attractions; Convenience addresses the accessibility of commuting, dining, and accommodation; Comfort reflects the quality of the spatial environment by avoiding high-traffic areas; Rapidity concerns time efficiency during transit; and Economy evaluates cost-effectiveness relative to budget. These six indicators are processed through the Weights Fuser with adjustable weights.
In terms of Objective Indicators, the heritage value assessment framework developed in this study adopts a three-tiered structure that synthesizes authoritative sources: (1) Statutory foundations from China’s cultural protection system—namely the Law on Cultural Relics Protection (2002 Revision) and the Standards for Protection of Cultural Relics31,32; (2) Internationally recognized charters, including the Venice Charter33, the Nara Document on Authenticity (1994)34, and the ICOMOS China Principles (2015)35; (3) Contextual adaptations, resulting in a four-dimensional evaluation system comprising Historical Value, Aesthetic Value, Scientific Value, and Social Value Figs. 3, 4.

Heritage value evaluation system.

Space environment evaluation indicator system.
This framework is intentionally designed to align with China’s national development priorities while addressing methodological limitations in conventional heritage assessments. Specifically: Historical Value (HV) integrates UNESCO Criterion (vi) to emphasize Wuhan’s revolutionary legacy36; Aesthetic Value (AV) incorporates principles of traditional Chinese landscape aesthetics following ICOMOS-IFLA protocols37; Scientific Value (SV) adapts ICCROM material standards to reflect regional architectural techniques38; Social Value (SoV) reflects participatory models grounded in China’s ecological civilization strategy39.
The selected four-dimensional system (abbreviated as HV, AV, SV, and SoV) offers a balanced integration of international conservation principles with the distinct characteristics of Chinese urban heritage. This ensures methodological robustness while responding to the unique conservation needs and policy contexts within China.
Thematic Relevance (TR) measures the strength of association between a tourist attraction and predefined thematic routes. High-TR routes systematically integrate thematically aligned attractions, thereby enhancing thematic immersion, educational value, and overall experiential quality40. As an objective indicator, TR quantifies the degree of match between an attraction and the theme assigned by the tourist route. This indicator links to the Thematic dimension in the subjective indicator framework, aiming to meet the needs of tourists seeking in-depth thematic experiences.
For example, consider the attraction “Yellow Crane Tower”. Assume the predefined themes include the Yellow Crane Tower Historical Trail, Sun Yat-sen Historical Trail, Mao Zedong Historical Trail, and Zhang Zhidong Historical Trail. The method of assigning thematic relevance to the “Yellow Crane Tower” attraction is demonstrated in Table 1 below.
In terms of Spatial Environment, micro-scale built environment characteristics significantly influence pedestrian behaviors, as demonstrated in historic street studies41,42. Building upon this foundation, we adapt a set of spatial environment evaluation indicators from established scholarly literature18,19,20, including: ①Historical Environment Quality (HQ), ② Landscape Environment Quality (LQ), ③ Service Facilities Quality (SQ), ④ Guide System Quality (GQ), ⑤ Population Density (PD), ⑥ Internal Accessibility (IA), ⑦Discussion Heat (DH), ⑧Praise Rate (PR).
Among these, the Historical Environment (HQ) is proposed to be quantified through image analysis and streetscape image analysis, evaluating the historical atmosphere of attractions by assessing architectural style unity and visibility of historical elements. The Landscape Environment (LQ) is calculated statistically using the mean values derived from image analysis, user feedback, and expert ratings, reflecting the degree of greening and the arrangement of buildings. The Service Facilities Quality (SQ) measures convenience and service level by counting the number and type of supporting facilities such as public toilets, catering venues, and rest areas per square kilometer. The Guide System Quality (GQ) evaluation is based on field research and user questionnaires, illustrating how effective signage improves visitors’ comprehension of attraction information. Discussion Heat (DH) and Praise Rate (PR) are calculated as weighted averages of comments on various platforms: the former reflects the frequency of attraction-related discussions on social media and travel websites, while the latter indicates overall tourist satisfaction. Internal Accessibility (IA) quantifies transportation efficiency within the scenic area through two core metrics: ①Path network complexity: Analyzed via GIS-based road connectivity indices; ② Pedestrian walkway continuity: assessing unobstructed connections between key functional areas. P Population Density (PD) reflects the distribution and flow of tourists within each area of the attraction. The optimal value for this indicator lies within a reasonable range, as values too high or too low may negatively affect visitor experience.
In terms of Reverse Factors, existing studies demonstrate that constraint factors play a critical role in balancing tourist preferences10, particularly in reconciling competing objectives such as time efficiency and experience depth. Therefore, this study proposes three reverse factors that must be incorporated during route optimization: ① Dwell Time (DT): The total duration of tourist activities (such as visiting, photographing, and resting) at each attraction, reflecting the intensity of the visitor experience; ② Travel Time (TT): The travel time between consecutive attractions (excluding movement within an attraction), calculated based on actual route distances; ③ Visitor Cost (C): A standardized estimate of expenditure derived from official pricing data and crowd-sourced cost information.
In terms of Analysis of the Correlation Between Subjective and Objective Indicators, the final reference system of indicators is presented in Table 2 below. Furthermore, this section analyses the relationships between subjective and objective indicators to clarify the influence each indicator (as listed in the previous section) exerts on the overall objective function.
To systematically investigate the relationships between subjective and objective indicators, a structured questionnaire was administered to 127 participants. Participants were selected via stratified sampling to ensure a 95% confidence level with a ± 5% margin of error. The sample encompassed diverse age groups (18–25: 32%; 26–40: 41%; 41–60: 22%; 60 + : 5%) and visitor types (first-time visitors: 58%; repeat visitors: 42%). All respondents met the eligibility criterion of having visited at least two cultural heritage attractions within the study area in the past year. Demographic data indicated that 44% of participants were male and 56% female; 76% held a bachelor’s degree or higher; and 84% were domestic tourists.
The questionnaire employed a 7-point Likert scale to assess bidirectional correlations between subjective and objective indicators, where +3 represented a strong positive correlation, 0 indicated no correlation, and −3 denoted a strong negative correlation. To ensure data reliability, the questionnaire achieved a Cronbach’s alpha of 0.81 across all indicator items. Attention-check questions and response-time monitoring were implemented to identify and exclude invalid responses. The resulting correlation patterns are visualized in Fig. 5.

Relevance matrix for primary and objective indicators.
Description: This heat map illustrates the patterns of relevance between indicators. Higher positive correlations (dark blue) signify stronger relationships between the two indicators, offering valuable insights for further analysis and research. Conversely, negative correlations (dark red) indicate an inverse relationship. Lighter colored areas suggest that the subjective and objective indicators are not significantly correlated from this perspective Figs. 6, 7.

Dynamic planning methodology framework.

Comparison between the Proposed Algorithm and the Baseline NSGA-III Algorithm.
Representing the above figure in matrix form, R is substituted into the calculation in the later section:
$$R=\left[\begin{array}{ccccccccc}3 & 2 & 2 & 0 & 1 & 1 & 3 & 3 & 0\\ 3 & 1 & 3 & 2 & 3 & -1 & 1 & 3 & 0\\ 3 & 1 & 3 & 2 & 3 & -3 & 1 & 3 & 0\end{array}\right]$$
(2.1)
Notably, the framework is designed to establish direct mappings between three subjective dimensions (Thematic, Rapidity, Economics) and objective indicators, thereby eliminating the need for questionnaire-based quantification. Specifically, thematic evaluation corresponds directly to the Thematic Relevance (TR) metric, where each attraction’s TR score (0–3) determines its thematic rating. Rapidity is algorithmically derived from the sum of Dwell Time (DT) and Travel Time (TT) between sites, while Economy is calculated directly from Visitor Cost (C) data. These predefined mathematical relationships—Thematic = TR, Rapidity = DT + TT, and Economy = C—are encoded as deterministic rules within the NSGA-III optimization model, rather than treated as subjective perceptions. Consequently, these dimensions require no survey validation, as their values are generated programmatically from objective datasets rather than self-reported by users. Based on the relevance matrix above, the weights of the Optimization objectives in the many-objective Optimization model can be adjusted under different subjective demand dimensions.
Planning model construction
In terms of Introduction of the Algorithm, the Genetic Algorithm (GA) is the algorithmic framework employed in this study. It is a stochastic search algorithm inspired by the biological principle of “survival of the fittest.” As a classical global optimization method, GA offers high computational efficiency, strong practicality, and robustness, and is well suited to parallel implementation. NSGA (NSGA-III) is a variant of GA specifically enhanced for multi-objective optimization problems, capable of effectively handling multiple conflicting objectives and generating a set of Pareto-optimal (A Pareto-optimal solution refers to a solution obtained during the multi-objective optimization process that cannot be further improved without compromising the performance of at least one other objective) solutions.
Accordingly, this study selects NSGA-III as the core optimization algorithm. Unlike NSGA-II, NSGA-III performs selection by comparing solutions with reference points, thereby improving the uniform distribution of solutions. By introducing a reference point mechanism, it ensures that generated solutions are evenly spread across the objective space, preventing clustering and offering diverse route options. The algorithm is highly adaptable and performs well in high-dimensional objective spaces, making it especially suitable for “many-objective” (In NSGA-II, “multi-objective” typically refers to problems with 2 or 3 objectives, while “many-objective” denotes problems with at least 4 objectives. In this study, the number of optimization objectives is 6) optimization tasks.
This flexibility grants NSGA-III a distinct advantage in tourist route planning problems. For example, when tourists place greater emphasis on thematic relevance, the algorithm can adjust weights to priorities attractions with higher thematic association; conversely, when tourists focus more on economic factors, the algorithm favors attractions with lower costs. Thus, the adaptability of NSGA-III allows it to flexibly respond to varying optimization goals under different weight settings.
Furthermore, empirical studies report that NSGA-III achieves superior Pareto diversity when optimizing routes under real-time traffic and environmental inputs. While conventional single-objective optimization algorithms often struggle to satisfy users’ multidimensional needs simultaneously43,44, NSGA-III has been widely applied across various route-planning domains45, providing both a solid theoretical foundation and practical experience for this research.
In terms of Model Construction, to better address real-world heritage tourism route planning challenges and enhance the practicality of solutions, this study introduces two major modifications to the baseline NSGA-III framework.
The first is Flexible Constraint Handling in Fitness Evaluation. A dynamic penalty-based constraint mechanism inspired by Ruiz-Meza et al.10 is embedded within the fitness function to account for budgetary and time restrictions. When the total cost or travel time of the selected set of attractions exceeds predefined thresholds, the corresponding solution incurs a significant fitness penalty. This mechanism effectively filters out infeasible solutions during the evolutionary process, thereby improving the practical feasibility of the generated route plans.
The second is Integration of Dynamic Programming for Path Optimization. The specific steps are as follows (see Appendix for detailed pseudocode): ① Distance Matrix Computation: Calculate the Euclidean distance between each pair of selected attractions to form a symmetric distance matrix; ② DP Table Initialization: Define the dynamic programming (DP) table D[mask][last] following Blank & Deb’s reference-point methodology45, where mask represents the set of visited attractions to maintain solution diversity and avoid premature convergence. Initialize D[1][0] = 0, representing the starting point at the first attraction; ③ Recursive Table Update: Iterate through all valid states of mask, updating D[mask][last] by considering transitions from previously visited nodes to the next unvisited node with minimal cumulative distance; ④ Optimal Sequence Recovery: Perform reverse backtracking from the final state to reconstruct the optimal visitation sequence. Iteratively remove the most recently visited node based on D[mask][last] values until the complete route is recovered.
These enhancements to NSGA-III ensure that the generated route solutions are not only Pareto-optimal across multiple objectives but also practical and executable.
Definition of Indicators and Algorithmic Framework: According to Table 1, six Optimization objectives are identified: ① Maximize Thematic; ② Maximize Diversity; ③ Maximize Convenience; ④ Maximize Comfort; ⑤ Minimize Rapidity; ⑥ Minimize Economics. The constraints are defined as follows: ① Total Travel Time limit (Max_Time); ② Total Cost limit (Max_Cost).
Figure 8 illustrates the structure of the many-objective Optimization model for heritage tourism trail planning, highlighting key input and output elements alongside core evaluation indicators.

Following the logic of the NSGA-III model, input variables include thematic-related historical and cultural resource types, geographic coordinates of attractions, site-level environmental conditions, user-generated feedback, and transportation conditions. Based on individual user characteristics and preferences, the relative weights of evaluation indicators are dynamically adjusted to generate optimal heritage tourism routes that cater to user-specific requirements.
Experimental process
In terms of the collecting and organizing background information, the selection of the study area is conducted in three steps. First, the scope of the historic urban area is defined by consulting relevant sources such as historical archives, local records, and academic research, in order to delineate the boundaries of the heritage site. Second, urban multimodal data are integrated to collect geographic coordinates of buildings. Finally, all collected data are input into a GIS platform to ensure the geographic information is accurate and complete, thereby establishing a foundational information database of the cultural heritage site’s contours and key locations.
Systematic organization of information related to the cultural heritage of the historic urban area proceeds as follows. First, the architectural heritage within the historic urban area is organized based on historical records, considering factors such as the age of buildings, significant historical events, and notable figures. Second, this information is systematized and categorized according to different themes—classifying architectural heritage based on criteria such as construction era, event relevance, and residences of prominent individuals—to form diversified historic district tourism trails. Based on extensive field research and expert consultations, and with reference to the cultural heritage tourism planning documents of Wuhan46, a preliminary design of thematic trails in Wuhan is presented in the table below Fig. 9.

In terms of the multi-source data extraction based on Wuhan database, this study takes three kinds of data Table 3.
The first is static data acquisition. The analysis of multi-source static urban data facilitates the identification of heritage route segments with optimal cultural resource density and route planning efficiency, forming the foundational basis for enhancing the Optimization of the visitor experience. The specific data indicators and corresponding data acquisition mechanisms are summarized in Table 4.
The second is dynamic data acquisition. During the user’s tour, the system dynamically adjusts and updates the tour route in real time based on information from attractions, city traffic data47, and weather conditions48. For example, if the user deviates from the established route, the system will prompt them to adjust their direction or skip nodes that are too distant. In the event of sudden weather changes, the system advises appropriate actions, such as purchasing rain gear or opting to visit indoor attractions. If congestion occurs around a specific attraction, the system recommends alternative transport modes. The indicators monitored in real time on the platform, along with their descriptions, are presented in Table 5 below.
At the current stage, all static data indicators listed in Table 4 have been integrated and applied in the computational experiments conducted in this study. However, the dynamic data adjustment system described above remains conceptual and has not yet been implemented or empirically tested. Its design represents a prospective enhancement for future platform development. No real-time simulation or field deployment of the dynamic module has been performed within the scope of the present study.
The third is the user feature data extraction, which are derived from individual behavioral patterns and activity history. When such behavioral data are insufficient, users are encouraged to manually provide preference information prior to departure. Before executing the routing algorithm, the system assesses the volume of the target user’s tourism history data: ① if the data volume exceeds a predefined threshold, a preference-based recommendation algorithm is applied to generate personalized travel tags for the user; ② if the data volume is equal to or less than the threshold, the system employs a default (equal-weight) routing recommendation algorithm to suggest attractions.
User locations are obtained in real time via GPS-based mobile device positioning services. This serves as the basis for defining the starting point of the tour route and enables dynamic adjustment during the tour.
While the user is on the move, the system continuously monitors their location in order to adapt the navigation path and attraction information based on real-time traffic conditions, timing constraints, and user feedback. The mechanism for collecting and processing user location data is illustrated in Fig. 10 below.

Location information extraction mechanism.
During the tour, if the system detects that the attraction at the user’s next stop has been closed within the expected tour time, the system will suggest the user to skip the node and go to the next one directly; if the user’s visiting duration at a certain attraction is too long, the system will carry out a reminder to speed up the tour progress in the attractions in the back or skip certain attractions that have a relatively low recommendation value.
Additionally, upon completion of the tour, users are invited to complete a satisfaction questionnaire covering dimensions such as landscape, transport, and service experience49,50,51. These assessments are utilized to improve the system and will be incorporated into subsequent iterations of the Thematic Route Recommendation Mechanism.
In terms of the data processing, this study compiled twelve evaluation indicators relevant to cultural heritage attractions within the historical city of Wuhan, including Coordinate Position (CO), Thematic Relevance (TR), a set of attraction evaluation indicators (HRV through DT), as well as Travel Time (TT) and Visitor Cost (C).
Due to the extensive data volume, analysis focused on 25 representative cultural heritage attractions located in the Wuchang Historic District. Their corresponding static data were incorporated into the planning experiments, serving as empirical input for the improved NSGA-III algorithm.
For the data set processing, the dataset utilized in this study is presented in Table 6 below. It was normalized (All indicator values were normalized to a [0, 1] scale using min-max normalization to ensure comparability across metrics and prevent scale bias during multi-objective optimization) for use in the experiments (see Appendix for the full table contents).
For the optimization objective and constraint setting, firstly, the objective function is modeled (In a many-objective optimization model, there are usually two or more conflicting objectives, and the goal of optimization is to find an equilibrium such that each objective is satisfied to some extent) based on the subjective indicators listed in Table 2, as follows:
$$\max \,{\mathrm{imize}}\left\{\begin{array}{l}{\mathrm{thematic}}=\sum T{R}_{i}\\ {\mathrm{diversity}}=\mathop{\sum }\limits_{{{j}}=1}^{{{n}}}{{{\omega }}}_{{{D}},{{j}}}\cdot \mathop{\sum }\limits_{{{i}}=1}^{{{m}}}{{{f}}}_{{{D}},{{j}},{{i}}}\\ {\mathrm{convenience}}=\mathop{\sum }\limits_{{{j}}=1}^{{{n}}}{{{\omega }}}_{{{C}},{{j}}}\cdot \mathop{\sum }\limits_{{{i}}=1}^{{{m}}}{{{f}}}_{{{C}},{{j}},{{i}}}\\ {\mathrm{comfort}}=\mathop{\sum }\limits_{j=1}^{n}{\omega }_{F,j}\cdot \mathop{\sum }\limits_{i=1}^{m}{f}_{F,j,i}\end{array}\right.$$
(3.1)
$$\min \,{\mathrm{imize}}\left\{\begin{array}{l}{\mathrm{travel}}\_{\mathrm{time}}=\sum {{\rm{T}}}_{{\rm{i}}}+\sum {\mathrm{TT}}_{{\rm{i}}}\\ {\mathrm{economy}}={\sum} {C}_{i}\end{array}\right.$$
(3.2)
where ω is the set weight of the value indicator (The set weights of the value indicators are the relative importance assigned to the different optimization objectives or evaluation indicators in many-objective optimization. These weights reflect the extent to which each underlying indicator influences the final outcome during the optimization process), respectively, where the data are derived from the matrix R (Eq. 2.1).
Secondly, the optimization process must comply with the following constraints to ensure the planning scheme aligns with the user’s basic situation and needs: ① Time constraint: The total visit time must not exceed the user-defined time limit. This includes the visit time at each attraction and the commuting time between attractions to guarantee completion of the tour within the specified period; ② Budget constraint: The total cost must not exceed the user’s budget. This requires the rational selection of attractions during route planning to control overall expenses. Accordingly, the time constraint (Max_Time) is set as:
$$\mathop{\sum }\limits_{i=1}^{n}{T}_{i}+\mathop{\sum }\limits_{i=1}^{n}T{T}_{i-1}\le \,{\rm{Max}}\,\_{\rm{Time}}$$
(3.3)
The budget constraint (Max_Cost) is set as:
$$\mathop{\sum }\limits_{i=1}^{n}{C}_{i}\le \,{\rm{Max}}\,\_{\rm{Cost}}$$
(3.4)
In terms of the NSGA-III Parameter Setting, to ensure the trail planning scheme meets users’ actual needs, six optimization objectives are set, with initial objective weights uniformly assigned to allow subsequent adjustment according to user preferences.
Additionally, the algorithm’s maximum number of iterations is set to 150 generations, with an elite proportion retaining 50% of individuals; the crossover probability is set to 0.5, and the mutation probability to 0.2 to enhance population diversity; the maximum number of generations without improvement is set to 200 as a convergence criterion. Detailed parameter settings are shown below.
In terms of the Optimization Process, after generating the initial population, the following steps are performed: ① Crossover operation: Crossovers are performed on parent individuals to generate offspring and increase population diversity; ② Mutation operation: Mutation is applied to some individuals to further enhance diversity; ③ Non-dominated sorting: Parent and offspring populations are merged and sorted to identify the Pareto frontier; ④ Introduction of reference points: A set of reference points (or ideal points) is generated based on the number of objective functions to ensure uniform distribution of solutions; ⑤ Elite retention: The highest-scoring individuals in the current population are retained to ensure transmission of high-quality solutions; ⑥ Output of results: Ultimately, the highest scoring individual from the population is selected to output the optimal travel route, including route, total time, total cost and score.
link
