March 25, 2025

Voyage into Spectacular Travels

Unveiling Authentic Journeys

Evaluation of tourism elements in historical and cultural blocks using machine learning: a case study of Taiping Street in Hunan Province

Evaluation of tourism elements in historical and cultural blocks using machine learning: a case study of Taiping Street in Hunan Province

Overview of the study area

The study selects Taiping Old Street, one of the historic and cultural blocks in Changsha City, as the research object (Fig. 1). Taiping Old Street is located in the heart of the Wuyi business district in Changsha, Hunan Province, adjacent to the Xiangjiang River. Covering an area of 5.07 hectares, it is one of the best-preserved historical and cultural neighborhoods in Changsha’s ancient city. Its origins date back to the Ming and Qing dynasties, when it served as an important commercial hub and cultural exchange center. The street is lined with numerous historical buildings, featuring small green tiles, horse-head walls, and intricately carved doors and windows, which together create a distinctive architectural style. These structures embody rich historical memories and cultural heritage. This study selects Taiping Old Street as the research subject for the following reasons.

Fig. 1
figure 1

First, Taiping Old Street exhibits the typical characteristics of a historical district. It has a long history, tracing back to ancient times, and as the only historical street in Changsha preserved since the Ming and Qing Dynasties, it embodies the quintessential features of a historical neighborhood.

Second, Taiping Old Street holds significant commercial tourism value. Since its renewal and reconstruction in 2006, the street has managed to preserve its deep historical and cultural heritage while seamlessly integrating modern commercial elements. This fusion has created a unique blend where traditional culture and contemporary commerce coexist harmoniously. The area is home to a wide variety of businesses, including traditional local snack shops, souvenir stores, and handicraft shops, which collectively attract a large number of tourists.

Third, Taiping Old Street generates a substantial volume of review data on social media platforms, which can be leveraged for analysis. Changsha’s Wuyi Business District receives 30 million tourist visits annually, ranking second among the national core business districts in terms of traffic. As one of the key tourist destinations, Taiping Old Street benefits from significant foot traffic, resulting in a wealth of comment data available for analysis.

Fourth, Taiping Old Street faces several challenges common to historic districts. The sharp increase in tourist numbers has brought notable economic benefits but has also led to issues such as traffic congestion, particularly during peak seasons, which negatively impacts the visitor experience. Additionally, the area suffers from an imbalance in its functional development, with the catering industry dominating the street. There are also challenges in balancing commercial development with the preservation of historical and cultural assets. Some shop owners, in pursuit of profit, have over-decorated their establishments, and the excessive introduction of modern commercial elements has compromised the original, simple aesthetic of the historic buildings.

Currently, scholars have conducted various studies on Taiping Old Street, primarily focusing on the distribution of business forms44, quality evaluation45, and the impact of renewal and renovation46 on the neighborhood. For instance, Xu et al. investigated the community characteristics of the businesses on Taiping Old Street, analyzed the intrinsic connections between the businesses, and explored their symbiotic mechanisms44. Sun et al. employed multi-source data to assess the quality of Taiping Old Street45, while Wang et al. examined the renewal and renovation of the area to explore the potential value of the historic district in the development of a new type of urban space46. However, despite these studies, a comprehensive and scientific evaluation system for the tourism elements of Taiping Old Street remains lacking. This study seeks to address this gap by constructing a machine learning-based evaluation system to deeply analyze the tourism elements of Taiping Old Street and provide a scientific foundation for the sustainable tourism development of the historic district.

Research framework

The article presents a machine learning-based analysis method to develop an evaluation system for historic districts. The first step involves gathering research data. Using “Taiping Old Street” as the keyword, all visitor reviews from the evaluation section on the Dianping website are collected. The second step focuses on data preprocessing. The initial processing includes tasks such as removing stopwords, adding key terms, and merging synonyms to prepare the data for evaluation index selection and weight calculation. The jieba Python library is then used to further segment the text, resulting in DATE1. Subsequently, the data required for sentiment analysis are also preprocessed. Sentences are segmented by punctuation, and text data exceeding 10 Chinese characters are selected. The data are then randomly divided into training and test sets in a 6:1 ratio, forming DATE2. The third step involves analyzing the collected data. First, the optimal number of topics is calculated based on perplexity scores, and topic clustering is performed on DATE1 using the LDA model. This forms the evaluation indexes for the tourism elements of the historic district, derived from the overall characteristics of each topic. The weight of each index is calculated based on keyword frequency within these topics. Next, sentiment analysis is conducted on DATE2 using the BERT model, yielding a probability distribution of positive, neutral, and negative sentiments for each topic. In the fourth step, the evaluation results are normalized across two dimensions: the weights of the evaluation indicators and the probability of positive sentiment distribution. Using the weights to represent importance and the positive sentiment distribution to represent satisfaction, IPA analysis is applied to visually illustrate the performance-satisfaction outcomes for each indicator (Fig. 2).

Fig. 2
figure 2

Data sources and processing

Data sources

This study utilizes reviews of Taiping Old Street from VW Dianping ( as the primary data source. VW Dianping, China’s leading platform for local lifestyle information and transactions, is also recognized as the world’s first independent third-party consumer review website. As a professional review platform, it boasts a large user base and provides comprehensive evaluation data. Using Python, the comment texts related to Taiping Old Street were scraped from VW Dianping’s review section. The collected data includes the reviewer’s name, review date, rating score, and review content, resulting in a total of 8872 initial reviews.

Data processing

  1. (1)

    Evaluation indicators and weights section

    This study involves crawling review texts related to Taiping Old Street from popular review platforms using Python. The initial review texts are often written in non-standard language and contain a significant amount of noise; thus, data processing is necessary to enhance the usability of these texts for machine learning. The data processing consists of six steps. First, preliminary data processing. The comment information is organized and analyzed by removing irrelevant data such as duplicate comments, advertisements, and unrelated entries, resulting in a total of 8800 valid comments. Second, remove unnecessary words. The Jieba library in Python is used to preprocess the raw data by removing filler words and meaningless terms. Using the Chinese stop-word list, irrelevant words, quantifiers, adverbs, and symbols that could affect thematic analysis—such as “haha,” “wow,” “one,” “all the time,” etc.—are deleted. The data is then further processed using the Jieba library. Third, a custom thesaurus is created. Local characteristic words from Taiping Old Street, such as “Cha Yan Yue Se,” “Sugar and oil” and “Jia Yi’s former residence” are added to the thesaurus. This enhances the accuracy of word classification and strengthens the relevance between the vocabulary and the research object. Fourth, replace near-synonyms. Construct a thesaurus of near-synonyms, treating all synonyms as equivalent terms. For example, “Taiping Street,” “Taiping Old Street” and “Old Street” can be uniformly replaced with “Taiping Old Street,” while “shop” and “small shop” can be consolidated into “shop.” Fifth, Python’s Jieba library is used to segment the processed data and extract keywords from the comment text. Sixth, frequency statistics are conducted on the segmented data to identify high-frequency topic words and generate a vocabulary frequency table. Following these steps, a total of 173,811 valid data samples were obtained, referred to as DATE1.

  2. (2)

    Emotional analysis

In Chinese writing, a complete sentence often contains multiple elements. For instance, “The food here is quite good, but there are too many people, you have to queue up everywhere you go, and there is no resting place.” This single comment addresses the food, environment, and infrastructure, where the sentiment about food is positive, but the sentiment about the environment and infrastructure is negative. Therefore, in sentiment analysis of review texts, it is essential to divide each review into multiple phrases. The data processing consists of three steps: first, utterance segmentation. In Chinese texts, punctuation is commonly used to segment utterances. Thus, this study adopts punctuation marks such as “”, “?”, “!”, and “..” as the basis for text segmentation to ensure maximum correspondence between the topic and the evaluation content. This method resulted in a total of 84,082 data points after text segmentation. Second, screening and analysis. Longer statements are more conducive to semantic analysis in connection with the context, so data containing more than 10 Chinese characters are selected as samples for sentiment analysis, resulting in a total of 27,013 entries. Third, deletion of useless data. Blank, repetitive, and non-sentimentally inclined comments were removed. After these steps, 25,638 valid samples containing thematic evaluation content were obtained, named DATE2.

Research methodology

Evaluation of the construction of indicators

In this study, evaluation metrics are constructed based on the topics and high-frequency keywords extracted from the LDA topic model, one of the most powerful techniques in text mining for data analysis, latent pattern discovery, and identifying relationships between data and text documents25. The LDA topic model is highly effective in analyzing the complex structures of documents, topics, and words in review texts, revealing hidden topic patterns and relationships. By using LDA, meaningful evaluation indices can be extracted from large volumes of unstructured text data, enhancing the scientific rigor of index selection and reducing the bias inherent in manual classification. The LDA model uncovers the probability distribution of topics in review texts, such as those in the VW Review dataset, by analyzing the three-tier structure of documents, topics, and words. As shown in Eq. (1), this analysis process generates a matrix vector consisting of n topics and corresponding m keywords. Each column represents the probability distribution of each word across the m topics, while each row represents the probability distribution of each topic across the n words.

$$\left.\left[\begin{array}{cccc} & {w}_{1} & \cdots & {w}_{n}\\ {t}_{1} & P\left({w}_{1}+{t}_{1}\right) & \cdots & P\left({w}_{n}+{t}_{1}\right)\\ \vdots & \vdots & & \vdots \\ {t}_{m} & P\left({w}_{1}+{t}_{m}\right) & \cdots & P\left({w}_{n}+{t}_{m}\right)\end{array}\right.\right]$$

(1)

This article adopts Perplexity, a scientific model used to determine the optimal number of topics K. Perplexity is a key metric in natural language processing that measures the predictive power of language models. It has a solid theoretical foundation and is widely applied in the evaluation of topic models. The formula for calculating topic perplexity is presented in Eq. (2):

$${Perplexity}={e}^{\frac{-\sum \log \left(p\left(w\right)\right)}{N}}$$

(2)

In this equation, p(w) represents the probability of each word in the test set, and N denotes the number of words or the total length of the test set. The rationale behind this is that when the model has a higher probability of predicting words in the test set, the perplexity value decreases, indicating that the model performs better on that test set. In other words, a lower perplexity suggests that the model’s thematic structure is relatively stable and the expected error is smaller. A lower perplexity also means the model can capture the semantic information in the text more accurately, thus fitting the data better. The effectiveness and importance of perplexity in topic model evaluation have been discussed in numerous studies. For example, Hoffman et al. used perplexity to evaluate the performance of LDA models on different datasets and demonstrated that perplexity effectively reflects the degree to which the model fits the data, providing an important basis for model parameter selection and optimization47. Wallach et al. elaborated on the advantages of perplexity as an evaluation metric for topic models, emphasizing its key role in measuring the relationship between the model and the data, which provided strong theoretical support and practical reference for its use in this study48.

In this study, we calculate the perplexity for different numbers of topics using the scikit-learn package in Python. The specific results show that the perplexity is lowest when the number of topics is 7, indicating that this is the optimal equilibrium point, where the model achieves its best predictive performance.

The process of constructing the evaluation index is as follows: First, high-frequency keywords are screened. DATE1 is used as the data sample, and the trained LDA topic model classifies DATE1 into 7 topics, generating a “topic-vocabulary” matrix. The selection of high-frequency topic words is based primarily on word frequency statistics, supplemented by the TF-IDF index49. The top 10 most frequent words in each topic are selected as high-frequency candidates, which are then verified using the TF-IDF index. To ensure low-frequency keywords are not overlooked, a review panel was established, consisting of experts in the fields of culture and tourism, senior merchants from Old Street, and frequent travelers. This panel manually screens low-frequency words associated with the model-generated topics to identify potential high-impact indicators50. Additionally, this process was used to fine-tune the parameters of the LDA model, moderately increasing the weight of low-frequency keywords in subsequent iterations. This adjustment helps the model adapt more effectively to complex tourism scenarios, taking into account both high-frequency general terms and low-frequency specific features.

Next, the evaluation team—comprising culture and tourism experts, senior merchants, and frequent travelers—generated evaluation indicators by summarizing the characteristics of the high-frequency keywords in each topic. For example, topic 1, which is closely related to local snacks, business, food, shopping, drinks, entertainment, and other factors, highlights the basic needs of tourists and is therefore categorized as “Regional Function.” Similarly, the topics of topics 2 through 7 were described as follows: Spatial Accessibility, History and Culture, Environmental Features, Local Characteristics, Management Services, and Tourism Experience.

Evaluation indicator weights

In this study, the weights of the topics are determined based on the frequency of the corresponding keywords within the “topic-vocabulary” matrix. The high-frequency keywords are then summarized to derive feature words that represent the content of each topic. The weight of each topic is calculated using Eq. (3).

$${I}_{k}=\frac{{n}_{k,j}}{\sum _{k}{n}_{k,j}}$$

(3)

In this formula, Ik represents the weight of the topic, nk,j is the number of occurrences of the keyword associated with the topic in document dj, and the denominator is the total occurrences of all keywords in document dj. The weights of each topic were calculated using Eq. (3) based on the results of the seven topics identified by the LDA model.

Sentiment analysis

In this study, the BERT model is employed to conduct sentiment analysis, offering deeper contextual sentiment capture compared to traditional natural language processing (NLP) techniques such as Word2Vec, CNN, RNN, and Bi-LSTM51. The BERT model consists of an input layer, a Transformer encoder layer, and an output layer. First, comment text from DATE2 is processed, with each piece of data transformed into a sequence of word embeddings, starting with [CLS] tags and ending with [SEP] tags. This sequence is then passed through BERT’s Transformer encoder to extract deep contextual semantic information. Specifically, the model takes as input the embedding containing the [CLS] token (X0) and the text sequence embeddings (X1 to XN). These inputs are then encoded through BERT’s multi-layer Transformer structure to generate high-level feature representations. Finally, the output layer uses a softmax function to transform the encoded representations into probability values for each classification result, indicating the likelihood of each sentiment category (Fig. 3).

Fig. 3
figure 3

Sentiment analysis framework.

After this process, the model can accurately identify the emotional tendencies of the comment text and classify the text into three categories: positive, negative, and neutral emotions. Finally, subject satisfaction is calculated based on the classification results, with the calculation formula presented in Eq. (4).

$${P}_{k}=\frac{{w}_{p}\times P+{w}_{n}\times N+{w}_{{neg}}\times {Neg}}{T}$$

(4)

Where P represents the number of positive comments, N is the number of neutral comments, Neg is the number of negative comments, and T is the total number of comments for a given topic. The weights for each sentiment are defined as: Wp = 1 for positive comments, Wn = 0 for neutral comments, and Wneg = −1 for negative comments. DATE2 is used as the data sample for sentiment analysis, with the data classified according to the keywords of the topics in each piece of data before the analysis. Using the LDA model, the keywords of each topic are identified and matched with the corresponding comments from DATE1. This keyword matching determines the relevant topic for each comment. For example, in DATE2, if a comment states, “The sugar and oil here is super delicious,” the keyword “sugar and oil” can be located in DATE1. Based on this match, the comment is classified into the corresponding topic.

The IPA method of analysis

Importance-Performance Analysis (IPA) is used to gain detailed insights into the performance of tourism elements on Taiping Old Street. IPA was first introduced by Martilla and James in 199752. Its fundamental concept is to assess the importance of various factors influencing user satisfaction and to evaluate the actual performance of these factors as experienced by users. The goal is to identify the strengths and weaknesses of the evaluated elements. The IPA four-quadrant diagram provides a clear visualization of both the development priorities for tourism factors in Taiping Old Street and the level of user satisfaction with these factors, based on users’ evaluations. In this analysis, the weight of each topic is considered as “Importance,” as shown in Eq. (5), while the distribution of topic satisfaction is represented as “Performance,” as in Eq. (6). “Importance” is used as the horizontal coordinate, and “Performance” is used as the vertical coordinate. The methods for calculating “Ik” (Importance) and “Pk” (Performance) were introduced in Eqs. (3) and (4) in the previous section. To establish the four-quadrant graph, the average values of “Importance” and “Performance” serve as the dividing lines for the horizontal and vertical axes, respectively. Based on these axes, the IPA diagram divides the evaluation indicators into four quadrants: Advantage, Opportunity, Vulnerable, and Patch. This allows for a visual representation of the performance satisfaction of each tourism element.

$${{{Importance}}_{k}=I}_{k}=\frac{{n}_{k,j}}{\sum _{k}{n}_{k,j}}$$

(5)

$${{Performance}}_{k}={P}_{k}=\frac{{w}_{p}\times P+{w}_{n}\times N+{w}_{{neg}}\times {Neg}}{T}$$

(6)

Model performance evaluation

To evaluate the validity and reliability of the constructed model, we employed a cross-validation approach. The dataset was divided into seven equal subsets. In each iteration, one subset was selected as the validation set, and the remaining six subsets were used for training. The model was trained on the training data, and evaluation metrics were computed based on the validation set. This procedure was repeated such that each subset was used as the validation set once. The final performance metric was determined by averaging the evaluation results from all iterations. In this study, we primarily used four evaluation metrics: Accuracy, Precision, Recall, and the F1 score. Accuracy is defined as the proportion of correctly predicted instances out of the total number of samples. Precision refers to the proportion of true positive predictions among all predicted positive samples. Recall represents the proportion of true positives among all actual positive instances, and the F1 score is the harmonic mean of Precision and Recall53, which is calculated using the following formula:

$${Accuracy}=\frac{{TP}+{TN}}{{TP}+{TN}+{FP}+{FN}}$$

(7)

$${Precision}=\frac{{TP}}{{TP}+{FP}}$$

(8)

$${Recall}=\frac{{TP}}{{TP}+{FN}}$$

(9)

$$F1=\frac{2\times {Precision}\times {Recall}}{{Precision}+{Recall}}$$

(10)

In this study, the following terms are used to define the model’s performance: TP (True Positive) represents the number of samples that are actually positive and predicted as positive by the model; TN (True Negative) refers to the number of samples that are actually negative and predicted as negative by the model; FP (False Positive) denotes the number of samples that are actually negative but predicted as positive by the model; and FN (False Negative) represents the number of samples that are actually positive but predicted as negative by the model. The results of the cross-validation for both the LDA and BERT models are presented in Table 1.

Table 1 The results of cross-validation

The results show that the LDA model achieves high accuracy and F1 score in topic classification, indicating its effectiveness in clustering review texts and accurately identifying the topics of different tourism elements. The BERT model also performs well in sentiment analysis, with high accuracy, recall, and F1 scores, demonstrating its ability to accurately assess the sentiment of the texts and provide a better understanding of tourists’ evaluations for precise classification.

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.