Sharing of Department Summer Internship 2020
NG, Siu Yan, BSc in Statistics
For this internship programme, I was assigned to the Science and Technology Section of the Census and Statistics Department (C&SD). This section mainly collects statistics that reflect public technology usage and the status of innovation in Hong Kong.
One of my major tasks was to manage a large database by creating a new user interface in Microsoft Access. To complete this task, I had to learn to use Access VBA and SQL, which were completely new to me. At first, this was quite challenging, as my weakness lies in programming, and I had to learn these languages on my own. However, with the assistance of colleagues, I adapted to the programming work. I was also assigned a research job that involved collecting and summarizing information to serve as a quick reference for survey interviewers during the data collection process. From these tasks, I learned not only new programming skills, but also how the Science and Technology Section processes various surveys.
I also gained a lot of knowledge from my supervisor at CUHK, Professor Fang Xiao. I was required to analyze a dataset on COVID-19 in Hong Kong to estimate and predict the number of daily cases. Although I found this difficult, by following my supervisor’s instructions I was able to complete the task and write a report on the foundations of the data analysis. This helped me to gain a deeper understanding of statistical techniques.
The Professional Attachment Programme is generally very good. This programme offers an ideal opportunity for students who want to gain more working experience and understand the work of the C&SD.
DENG, Rongchen, BSc in Statistics
I was very pleased to be assigned to the Trade Statistics Processing Section of the Trade Statistics Branch (2), supervised by Ms Ng. This section mainly collects and analyses import/export declarations and cargo manifest records. I was responsible for using deep learning methods to help match import/export declarations with electronic cargo manifests.
I welcomed the adventure of exploring deep learning and natural language processing. As I lacked knowledge of deep learning, Python packages and trading data, I initially had a tough time understanding the relevant techniques, such as word embedding and Siamese networks, and applying the Python packages to the shipping data. Thankfully, my supervisor was patient and willing to give advice when I encountered difficulties. I finally managed to break the task down into pieces and gradually learn the related processes. I also conducted desk research on different models for measuring text similarity to establish whether records match. Under the guidance of the faculty supervisor, Prof Lin, I gained insight into how statistical methods can be applied across diverse fields. I greatly appreciate the freedom Prof Lin allowed me to explore the topics that interested me, and his suggestions throughout the process.
I gained a lot from this internship experience. Through dealing with real-world data and experiencing how crude the data were, I changed my mind about data pre-processing. In addition, I honed my problem-solving skills and applied the knowledge I had learned to gain a deeper understanding of the issues. Through this programme, I gained a glimpse of the working environment and was fortunate to work under two supportive supervisors, along with a wonderful group of colleagues. I am grateful to have had this opportunity to join the department’s Summer Internship Programme.
LEUNG, Hoi Ching, BSc in Risk Management Science
During my internship at the C&SD, I was honored to be assigned to the Technical Secretariat Section, a special section that performs ad hoc tasks and provides technical support. My main duties were to conduct research on the comparison of various microdata access schemes provided by the national statistics offices of advanced economies. “Microdata” in this context refers to the individual records of households or businesses, which are very useful for research, but raise concerns about confidentiality. Eurostat and UNECE hold a biannual international conference on statistical confidentiality and statistical disclosure control (SDC), and I was required to summarize the key issues in the research papers presented at these events by participating professors from international universities. The topics covered different aspects of microdata usage, such as microdata generation, protection and dissemination.
During these two months, I became familiar with the microdata access schemes available in advanced economies such as the UK, Canada and the Netherlands. I was amazed by the variety and convenience of some of these schemes, which provide great flexibility and utility for researchers while simultaneously ensuring data confidentiality. After my research, I had a more thorough understanding of current developments and the tools available to perform SDC on datasets. My supervisor also introduced me to Hong Kong’s current policy and the ways that confidentiality issues are addressed. There is more to be done to promote data utility and support academic research in Hong Kong.
This internship expanded my understanding of statistics, particularly the practical aspects. Although I had learned the basics of data analysis and hypothesis testing at school, most of this was detached from real world constraints. There are many controls that the government must implement before providing statistics and datasets, which data users like us are often not aware of. I was lucky to gain a glimpse of the operation of the C&SD and explore the work it does. I also met great supervisors and colleagues who gave me plenty of guidance and support during my internship. This will certainly remain an unforgettable experience in my university life.
WANG, Dingdong, BSc in Statistics
I am grateful to have been given the valuable opportunity to work as an intern in the C&SD. I was assigned to the National Income Branch. This section mainly deals with external merchandise trade statistics but also conducts customer opinion surveys.
The work experience gave me a taste of what the real work of a statistician is like. Over my two-month internship, I was responsible for nowcasting Hong Kong’s GDP using information from search engines. Through reading a lot of material related to nowcasting and forecasting, and collecting a basic knowledge of macroeconomics, I learned a lot about how to apply statistics in economics. With the guidance of my supervisor, I finally built a nowcasting model and designed an automation programme in R language using Google Trends data as the main data source. Through these real-world tasks, I not only consolidated the knowledge I had mastered at university, but also recognized the appeal and importance of statistics.
My supervisor Chan Kin Wai was very supportive and introduced me to the project he was currently working on. I acquired substantial knowledge and matured a lot while undertaking this research. During this process, I was mainly responsible for single and multiple imputation analysis based on the notion of data depth. This was a major challenge for me, but with my supervisor’s patient guidance, I was kept on the right track and eventually completed the task. The challenge of entering the unknown, learning R code from scratch to create the algorithm, and having to read a lot of relevant statistical knowledge brought me plenty of surprises and a sense of achievement.
Overall, this was a precious opportunity and an enlightening experience, and I greatly appreciate the efforts of the Department of Statistics to make this internship possible.
ZHANG, Qianhua, BSc in Quantitative Finance and Risk Management Science
This summer I worked at the Labour Statistics Division of the C&SD. I was responsible for finding and comparing alternative seasonal adjustment methods during the COVID-19 period. I encountered various difficulties during this research project. For example, it took substantial effort to learn the X-12 ARIMA software (the software in use at the time by the C&SD to conduct seasonal adjustment). I identified two alternative software systems from a literature review and compared their performance by running simulations using R. Finally, I completed this research project with a 20-page report and a recommendation to the C&SD for an alternative to provide better seasonal adjustment.
I consolidated my research skills through the literature review and strengthened my coding skills by implementing models using R and X-12 ARIMA. The research and analysis process also pushed me to pick up new skills, such as running a pivot-table for data analysis and using the “ggplot2” library in R for the data visualization. I am grateful to my supervisor, Eddie, who gave me excellent guidance and also career advice. I also appreciated the help of my colleagues.
I also worked with Prof. Wei this summer as a research assistant and was responsible for conducting literature reviews in two areas: simulated annealing and parallel tempering. As these topics were both completely new to me, I spent a large amount of time learning the concepts and theories. As I understood more about the theories, I could follow the literature in greater depth and express the findings in my own words. Writing literature reviews also honed my skills in academic writing. In the process, I familiarized myself with the common text editor LaTeX. At the end of the programme, I wrote code to implement a parallel tempering MCMC algorithm for a high-dimensional mixture model (given in a journal article). This further enhanced my understanding of parallel tempering and strengthened my programming skills. I am most grateful to Prof. Wei for guiding me and providing me with advice during the research process.
All in all, I gained work experience both as an employee at C&SD and as a researcher on campus. My exposure to various research projects and to working independently helped me develop a wide range of transferrable skills and broadened my horizon in statistics. I am grateful for this opportunity provided by the Statistics Department and the support rendered along the way.
MA, Zhijie, BSc in Statistics
I am very honored to have been selected to serve as a Junior Research Assistant at the Centre for Clinical Research and Biostatistics (CCRB) this summer. Working with Prof. Benny Chung Ying Zee, the director of the CCRB at the JC School of Public Health and Primary Care, along with his team, was a very meaningful and enjoyable experience.
During this internship programme, I was asked by Prof. Zee to construct a new search engine based on the Aims academic paper database of CUHK to improve the search performance of Aims. After a discussion with Mr Steven Yuk Fai Lau of the Research Association at the CCRB, I decided to build this new search engine in Python.
I first set up a MySQL database, and then used the Scrapy crawler framework to crawl the pages of CUHK’s Aims database, and save them to the newly built MySQL database. At the same time, NLTK was used to segment the keywords and the content of the paper. The code had to ensure that phrases in double quotation marks were not segmented but rather searched as a whole, consistent with the rules of the Google search engine.
I then chose the BM25 algorithm in the Gensim package to calculate the correlation scores and sort them in reverse order to display the paper information most closely related to the search keywords typed by the users. Finally, I used Python’s lightweight Web application framework, Flask, and deployed the search engine to the cloud server. This search engine realized the combined retrieval of paper title, abstract and author through a new correlation algorithm. It also realized the pagination function in the search results.
In these two wonderful months, I gained a great deal of knowledge that I had not previously learned in class. My self-learning ability was greatly improved, especially in computer programming. What impressed me most was the friendliness of the CCRB staff, who provided constructive suggestions, both work-related and more, which greatly benefited me. In particular, Dr Jack Lee and Mr Steven Yuk Fai Lau gave me a clearer understanding of my current work and future study plan. I am very grateful for their help and advice.
In addition, I would like to thank Professor Lin Yuanyuan of the Department of Statistics, and Benny Zee and Ms. Maria Ming Po Lai of the CCRB for their help and support during my internship programme, which enabled me to successfully complete my work despite the COVID-19 epidemic in Hong Kong.
Through this programme, I now know how to learn more effectively and have also greatly improved my learning ability in fields I was not familiar with or even aware of. I am very grateful to the Statistics Department and the CCRB for providing me with this excellent experience, which will be really helpful to my future studies and career.
XING, Qianyu, BSc in Statistics
Working at the CCRB as a Junior Research Assistant was an inspiring experience. I was honored to work with Professor Zee and the CCRB staff, who provided helpful guidance on my real-world project and encouraged me to learn to use productivity tools.
My first task was to organize the questionnaire information which had been collected by colleagues. I had seldom worked on raw data before, as my homework had usually provided cleaned datasets. I recognized that every record was precious, and became more cautious when cleaning data. I also noticed that I was weak in reading the handwriting of others, which required strengthening to become an effective team member. The second task, analyzing and writing a report, was my first project aimed at helping others. I struggled at first because I was somewhat limited by the analytical skills I had learned from courses and I attempted to apply statistical learning methods during the early stage. Professor Zee helped me to understand that the project should start with a simple description and summary, then progress to potentially answerable problems of scientific interest. My progress became smoother after I completed a research plan with selected problems. During the project, I read related papers to learn their analysis methods and reporting style.
Regrettably, I only worked in the office with my respective colleagues for seven days due to COVID-19. However, I appreciate that this internship helped me adapt to working remotely. As Ms Lai suggested, I made short PowerPoint slides to help present my ideas during Zoom meetings, and updated my progress by email whenever I had new observations to share.
It was a pleasure having the opportunity to work at the CCRB through the department’s Summer Internship Programme. During this internship experience I realized both my strengths and my weaknesses, gained an introductory view of the application of statistics, and optimized my career planning.
AU, Yui Ki, BSc in Risk Management Science
I was grateful for the opportunity to work as an intern at the New Media Group this summer, where I gained valuable experience. I was assigned to the Web and Application Platform Team, which is responsible for the company’s IT and database management.
During the internship, I was required to complete three main tasks. The first was to study the performance of the recommendation engine and propose a way to improve the algorithm. My second assignment was to optimize the article length for different business units. After collecting the completion rates of different articles and running an analysis to understand the current article performance, I had to propose an article length for different business units to improve the overall completion rate. My final task was to improve the click-through rate of the app push notification. I ran a multivariate regression analysis to find out how different factors, such as time, frequency and topic, affected the click-through rate, and made some suggestions based on the results. These three tasks allowed me to apply the theories I had learned in lectures to real-life scenarios, which provided a great opportunity to improve my practical statistical skills.
I would like to express my gratitude to the Department of Statistics and New Media Group for offering me this internship opportunity. I enjoyed my work at the New Media Group, and my colleagues were very friendly. I also learned a lot from this internship programme. In addition to practical statistical skills, my presentation and problem-solving skills improved greatly. I believe this real-life work experience will prove very helpful in my future career.