From Numbers To Graphs – Wearable Data Visualization With Code Interpreter

August 25, 2023

Guides

Introduction

Today, as a part of our series about Code Interpreter, we delve into data recorded during an exercise by a custom wearable device. The device is accelerometer-based (the sensor is similar to the one commonly present in smartphones) and records the tilt of the person along three axes during the exercise. This data can then be exported into the CSV file, which can be loaded into the Code Interpreter. The aim was to provide pleasant and informative visualizations of the person’s movements that the professional could inspect. Although stumbling a few times – even losing the input file – during the process, the Code Interpreter produced and executed a working Python code that loaded the input file – while managing to fix the errors in the format of the file – and generated beautiful interactive charts out of our data.

The charts were produced in HTML format, which makes them embeddable in various contexts – for example, in an app or a blog post such as this. The Python code was exported in the end, which makes it usable in other contexts, such as a personal computer or a server. This was verified by running the code locally in VS Code (a free development environment by Microsoft).

As always, you will find the original unedited transcription of the conversation with the Code Interpreter below, so you can inspect the prompts and see what works. If you’d like to inspect the Python code generated by the Code Interpreter, follow this link to the ChatGPT page.

Introduction
Results highlight: Interactive Charts
- Time series
- Polar chart
Enabling Code Interpreter
Conversation with Code Interpreter
Final notes
In conclusion, in ChatGPT’s words

Results highlight: Interactive Charts

Let’s start with the best part – the visualizations produced by the Code Interpreter. These were produced using Plotly – a free, open-source graphing library for Python (and other languages) focused on the programmatic creation of interactive charts. This comes pre-installed in the Code Interpreter environment and can be directly used by giving appropriate instructions to ChatGPT. The results cannot be (unfortunately) displayed directly in the chat, so the visualizations are provided as downloads to the user. These come in the HTML format and can be directly opened in the browser after the download.

Time series

The first chart produced is a simple time series plot for the ‘Yaw’, ‘Pitch’, and ‘Roll’ values as reported by the sensor. The x-axis represents the time of day, with the chart title including the date with the name of the day.

See the chart in full screen by clicking here.

Polar chart

The second chart is even more visually pleasing and takes more effort – on the ChatGPT’s part – to produce. It displays the movements of the exercising person, drawn on a circular shape representing the exercise environment. To produce it, ChatGPT had to come up with a mathematical formula to project the original data on the circular shape (see the formula in the conversation below).

See the chart in full screen here.

Enabling Code Interpreter

Before we proceed to the prompts and the conversation with ChatGPT, please note that Code Interpreter is exclusive to ChatGPT Plus. See our previous post in the series for instructions on its usage.

Conversation with Code Interpreter

Below comes the original conversation with the Code Interpreter. You can copy the prompts if you want to try them yourself. As the transcript below does not contain the Python code generated by the Code Interpreter, follow this link if you’d like to inspect it.

Code Interpreter

export.csv

I’m uploading a CSV file with four columns: Timestamp;Yaw;Pitch;Roll. A timestamp is a count of milliseconds since Linux epoch. The rest of the columns is the tilt of the person wearing a measurement device during the exercise, in a given time. Unfortunately, the CSV file is not well-formed and contains errors described below. Load the file, clean up the errors and parse the contents into the Pandas DataFrame. Preview of errors in the file: “`csv Timestamp;Yaw;Pitch;Roll 1689025918989;-1.25;-1.01;”-33.67 ” “` As you see, the closing quotation mark is placed on the new line, instead of the end of the line. This is true for every line, which means that every second line (starting from 3) contains only the quotation mark.

Understood. Let’s address the issues step by step:

We’ll first read the file and inspect its contents.
Address the issue where every second line contains only a quotation mark.
Correct the misplaced quotation marks in the ‘Roll’ column.
Parse the cleaned contents into a Pandas DataFrame.

I’ll start by reading and inspecting the file.

Finished working

As you’ve mentioned, every second line contains only a quotation mark. Let’s remove these lines and then concatenate the quotation mark to the end of the preceding line to correct the misplaced quotation mark in the ‘Roll’ column.

Finished working

The lines now appear to be in the correct format.

Next, we’ll parse these cleaned lines into a Pandas DataFrame.

Finished working

The data has been successfully cleaned and loaded into a Pandas DataFrame. Let me know if there’s anything else you’d like assistance with regarding this dataset or any other queries.

Please show me the first few rows of the dataset and its basic stats.

Finished working

I apologize for the inconvenience. It seems there was an internal error. Let me quickly fetch the first few rows and basic statistics for the dataset again.

Finished working

I apologize for the oversight. It seems that after the error, the previously loaded data was lost. I’ll need to reload and clean the CSV file before providing you with the first few rows and its basic stats. Let me handle that swiftly.

Finished working

I apologize for the inconvenience. It seems that the file was lost during the internal error.

Would you mind re-uploading the CSV file? Once you do, I’ll quickly parse the data and provide you with the requested details.

export.csv

Here.

Thank you. I’ll proceed to clean the data, load it into a Pandas DataFrame, and then provide you with the first few rows and its basic statistics.

Finished working

I apologize for the oversight. I forgot to import the necessary libraries. Let me correct this and proceed with the steps.

Finished working

Here are the results:

First Few Rows of the Dataset:

markdown
       Timestamp   Yaw  Pitch   Roll
0  1689025918989 -1.25  -1.01 -33.67
1  1689025919134 -1.28  -1.02 -35.92
2  1689025919135 -1.32  -1.05 -37.92
3  1689025919284 -1.35  -1.06 -39.86
4  1689025919286 -1.38  -1.08 -41.72

Basic Statistics:

shell
      Timestamp         Yaw       Pitch        Roll
count  1.89e+02   189.000000  189.000000  189.000000
mean   1.689026e+12    2.640370  -12.310423   -9.678360
std    4.459119e+03   71.406461   36.113329   63.739027
min    1.689026e+12 -167.680000 -112.320000 -175.430000
25%    1.689026e+12  -46.000000  -28.800000  -59.980000
50%    1.689026e+12   -1.480000   -5.920000  -21.940000
75%    1.689026e+12   43.530000   15.890000   26.530000
max    1.689026e+12  173.340000   55.970000  169.300000

Let me know if you’d like any further analysis or details.

Looks good. Can you convert the timestamp into a real date + time and plot the time series chart for the rest of the columns? Include time of day on the X-axis of the plot and date with name of the day in the chart title. Make sure the X-axis labels are spaced just right and readable.

Finished working

Here’s the time series plot for the ‘Yaw’, ‘Pitch’, and ‘Roll’ columns. The x-axis represents the time of day, and the chart title includes the date with the name of the day.

Let me know if you’d like any further adjustments or analysis.

Nice! Now, let’s try to generate the same visualization, but this time using an interactive plotting library such as Plotly – or whatever you have access to. If not possible to render the plot here, try to render the image snapshot of it and provide the interactive version as an HTML download.

Finished working

It seems that I’m unable to directly render the image snapshot of the interactive plot due to a missing package in this environment.

However, I can provide you with the interactive version as an HTML download. Let’s proceed with that.

Finished working

You can download the interactive version of the time series plot from this link.

Once you open the downloaded HTML file in a browser, you’ll be able to interact with the plot. Let me know if there’s anything else I can assist you with!

Looks good! Now we would like to display the same time series plot, but this time on a so-called polar chart (circular chart with the range of 0-360 degrees). Generate an interactive version and provide it as a download. However, let’s use these conditions: – The x-axis corresponds to the Yaw values. – The y-axis corresponds to the Pitch values. – This means that Yaw and Pitch are NOT to be considered as angles in this case. – We ignore the Roll column. – Each point on the plot corresponds to a timestamp. – The points are displayed on the plot and connected by lines to indicate their progression over time. This is basically a scatter plot, but we want it to be plotted on a polar chart. To plot it to the polar chart, we will need to convert each pair of (X, Y) to the pair of (angle, distance) using a combination of trigonometry and Pythagorean theorem.

Alright, let’s break down the process:

Convert each (Yaw, Pitch) pair into (angle, distance) using trigonometry and the Pythagorean theorem.
Plot the converted data on a polar chart.

The conversion can be done using the following formulas:

$\text{angle} = \arctan2(\text{Pitch}, \text{Yaw})$ – This gives the angle in radians from the positive x-axis.
$\text{distance} = \sqrt{\text{Yaw}^2 + \text{Pitch}^2}$ – Using the Pythagorean theorem to compute the distance from the origin.

Let’s first convert the (Yaw, Pitch) pairs into (angle, distance) pairs.

Finished working

The (Yaw, Pitch) pairs have been successfully converted into (angle, distance) pairs.

Now, let’s proceed with plotting this data on a polar chart using Plotly.

Finished working

The data has been successfully plotted on a polar chart.

You can download the interactive version of the polar chart from this link.

Once you open the downloaded HTML file in a browser, you’ll be able to interact with the plot. Let me know if there’s anything else I can assist you with!

Looks good. This chart now realistically displays the movements of the body of the person wearing a sensor during the exercise. Can you suggest any other possible analysis to be done on the dataset, or directly provide any additional comments and insights?

Certainly! Based on the data provided, which captures body tilts during exercise, here are a few possible analyses and insights:

Descriptive Analysis:
- Trend Analysis: By plotting the time series data, we can identify any trends or patterns in the Yaw, Pitch, and Roll over time. This can reveal consistent movements or changes in posture.
- Variability Analysis: By calculating the standard deviation or variance, we can understand the variability in movements. This can reveal the consistency and control of movements.
Frequency Analysis:
- Fourier Transform: Applying Fourier transformation can help identify the dominant frequencies in the movements, which can indicate the repetition rate of certain exercises.
Comparative Analysis:
- Range of Motion: By identifying the maximum and minimum values of Yaw, Pitch, and Roll, we can comment on the range of motion for the given exercise. A greater range suggests more extensive movements.
- Correlation Analysis: By studying the correlation between Yaw, Pitch, and Roll, we can comment on the interdependence of these movements.
Anomaly Detection:
- If we have reference data or if this exercise was meant to follow a specific pattern, we can use the dataset to identify any anomalies or deviations from the expected pattern.
Duration Analysis:
- By analyzing the timestamp data, we can comment on the duration of each specific movement or the entire exercise. This can provide insights into exercise intervals or rest periods.
Visualization Enhancements:
- Heatmaps: These can be used to identify the density or frequency of specific postures.
- 3D Plots: If we consider Yaw, Pitch, and Roll simultaneously, 3D scatter plots can provide a comprehensive view of the body’s orientation at any given time.
Advanced Analysis (if we have additional data):
- Comparative Study: If we have data from multiple sessions or multiple individuals, we can compare the sessions to study improvements or differences in exercise patterns.
- Machine Learning: If we have labeled data, we can train models to recognize specific exercises or postures.

Additional Comments and Insights:

The data suggests that the individual made complete or nearly complete rotations (as seen from the polar plot). This might indicate exercises that involve full body turns or rotations.
The variability in the data suggests dynamic exercises rather than static postures.
If this is a rehabilitation exercise, consistency and controlled movements are crucial. Any abrupt changes or spikes in the data might indicate sudden or unintended movements.

It would be beneficial to have additional context about the exercise or the goals of this analysis to provide more targeted insights.

Doing the comparative study and also comparing the data from the session to pre-recorded reference data would be our goal, but unfortunately, we don’t have necessary data as of now. Can you briefly describe necessary inputs and steps to achieve this?

Certainly! Comparing the data from a session to pre-recorded reference data can provide valuable insights into performance, deviations, and overall adherence to a prescribed exercise or movement pattern.

Here’s a brief outline of the necessary inputs and steps to achieve this:

Necessary Inputs:

Session Data: This is the data you’ve already provided, which captures the movements of an individual during a particular session.
Reference Data: This would be pre-recorded data of the same exercise or movement, ideally performed under controlled conditions or by an expert. It serves as the “gold standard” to which the session data is compared.

Steps:

Data Pre-processing:
- Ensure that both session and reference data are in the same format.
- If the sampling rates are different, consider resampling to have a uniform rate.
- Normalize the data if necessary (e.g., if one dataset has much larger values than the other).
Alignment:
- The start and end times of the movements in both datasets might not align perfectly. Consider techniques like Dynamic Time Warping (DTW) to align the two time series data optimally.
Difference Calculation:
- Compute the difference between the session data and reference data for each corresponding point in time. This will give you a time series of deviations.
Descriptive Analysis:
- Analyze the deviations to identify trends. For instance, if the deviations are consistently large at certain time points, it indicates specific movements where the session diverges from the reference.
- Identify maximum, minimum, and average deviations.
Visual Analysis:
- Plot both session and reference data on the same chart to visually identify differences.
- Plot the deviation time series to visually inspect where the most significant deviations occur.
Statistical Analysis:
- Consider statistical tests to determine if the deviations are significant.
- Calculate correlation coefficients to see how closely the session data follows the reference data.
Feedback Generation:
- Based on the deviations, generate feedback. For instance, “At time X, the yaw was supposed to be Y but was observed to be Z. Consider adjusting your posture.”
Machine Learning (Optional):
- If you have multiple sessions and reference datasets, you can train machine learning models to automatically detect and highlight deviations. Over time, this can help in real-time feedback systems.
Iterative Feedback:
- Use the insights and feedback from this comparative study to guide the individual in subsequent sessions. Over time, the goal would be to see the deviations decrease, indicating improved adherence to the reference movement pattern.

Additional Considerations:

Duration of Movements: Ensure that the duration of movements in both datasets is comparable. If not, consider trimming or extending the datasets.
External Factors: Account for external factors that might affect performance, such as fatigue, equipment differences, or environmental conditions.

Having a systematic approach to comparative analysis can provide valuable feedback, track progress over time, and ensure the effectiveness and safety of exercises, especially in contexts like rehabilitation or sports training.

Thanks. We’ll get back to this when we’ll gather reference data. For now, can you generate any additional visualizations (such as the heatmaps you’ve mentioned)? Use non-interactive tools, so the resulting charts can be displayed here.

Absolutely! Heatmaps can provide a great way to visually identify patterns and concentrations in data.

Given the data we have, I can generate the following heatmaps:

Joint Distribution Heatmap of Yaw and Pitch: This will show us regions where most of the body’s orientations are concentrated during the exercise.
Time Evolution Heatmap: This will display how Yaw and Pitch values evolve over time.

Let’s start with the joint distribution heatmap of Yaw and Pitch.

Finished working

I apologize for the inconvenience. There seems to have been an internal error. Let me generate the heatmap again.

Finished working

I apologize for the oversight. I’ll need to reload the data and libraries due to the error. Let me handle this swiftly and then generate the heatmap.

Finished working

I’m sorry for the inconvenience, but it seems that the previously uploaded file was lost during the internal error.

Would you mind re-uploading the CSV file? Once you do, I’ll quickly generate the heatmap for you.

export.csv

Certailny.

Thank you. I’ll now proceed to reload and preprocess the data, and then generate the heatmap for the joint distribution of Yaw and Pitch.

Finished working

My apologies for the oversight. The error was caused by an incorrect setting in the heatmap generation. I’ll correct it and regenerate the heatmap for you.

Finished working

Here’s the joint distribution heatmap of Yaw and Pitch. The heatmap visually represents the density or frequency of specific postures during the exercise. Darker regions indicate a higher concentration of data points, implying that those orientations (combinations of Yaw and Pitch) were more frequently adopted during the session.

Would you like to see the Time Evolution Heatmap or any other visualizations?

Yeah.

Finished working

I apologize for the inconvenience. The error occurred due to the large integer values of the timestamps when attempting to plot them directly.

I will correct this by converting the timestamps into a more suitable format for visualization and then regenerate the heatmap.

Finished working

Here’s the Time Evolution Heatmap of Yaw. The x-axis represents the progression of time, and the y-axis represents the Yaw values. Darker regions indicate more frequent occurrences of specific Yaw values at particular time points.

From this heatmap, you can observe how the Yaw values have evolved over time and identify periods where specific Yaw values were more dominant.

Would you like to see any other visualizations or insights from the dataset?

No, let’s wrap up. What would be some final words and observations?

Certainly! Here’s a summary and some final observations based on our exploration of the dataset:

Data Quality & Cleaning:
- The provided data required some cleaning due to misformatted lines. After the cleaning process, we were able to efficiently parse and analyze the data.
Data Characteristics:
- The dataset captures the body tilts of an individual during an exercise session using three metrics: Yaw, Pitch, and Roll.
- The time series plots and polar charts provided insights into the dynamic movements of the individual throughout the session.
Key Observations:
- The polar chart visualization highlighted that the individual made movements covering a wide range of directions. This might indicate exercises that involve full body turns or rotations.
- The variability in Yaw and Pitch suggests dynamic exercises, with frequent changes in orientation.
- The heatmaps allowed us to visually identify patterns and concentrations in the data, offering insights into the frequency of specific postures and the progression of movements over time.
Further Analysis:
- Comparing this dataset with reference data can offer valuable insights into performance, deviations, and overall adherence to a prescribed exercise or movement pattern.
- Advanced analysis, such as machine learning, can be applied to recognize specific exercises, postures, or to provide real-time feedback, given sufficient labeled data.
Recommendations:
- It’s crucial to maintain consistent data quality and formatting in future data collections to ease the analysis process.
- If the goal is rehabilitation or training, monitoring for abrupt changes or spikes in the data can be crucial to ensure the individual’s safety and the effectiveness of the exercise.

In conclusion, wearable sensor data, like the one provided, is invaluable for understanding human movement, especially in contexts like sports training, rehabilitation, or health monitoring. With the right analytical tools and techniques, such data can offer deep insights into movement patterns, performance, and overall well-being.

Thank you for sharing the dataset, and I’m here to assist if you have any future queries!

Final notes

In the end, ChatGPT was able to comment on the process, providing some key observations and suggestions for where to take this further – suggesting what data to collect and what additional insights it would enable. It was also able to provide the generated code for the download, in the form of Jupyter Notebook accompanied by a list of required libraries (requirements.txt file) which allowed us to reproduce the charts locally in VS Code:

In conclusion, in ChatGPT’s words

In our recent exploration of wearable sensor data, we navigated the intricacies of body tilt measurements during exercise. Our journey was emblematic of real-world data analysis, replete with its highs, lows, and unpredictable turns.

The Hiccups: Our analysis was not without its challenges. One recurring hiccup was the platform’s occasional loss of the data file, necessitating repeated uploads. Such challenges, while frustrating, are not uncommon in data analytics and serve as a reminder of the importance of resilience, adaptability, and meticulous data backup.

The Highlights: Despite these setbacks, our exploration was marked by several moments of clarity and achievement:

Interactive Visualizations: We successfully generated detailed and interactive graphs using Plotly. These visualizations allowed us to delve deep into the dataset and grasp the nuances of the individual’s movements.
Mathematical Formulations: To transform the data for a polar chart visualization, we derived and applied trigonometric and Pythagorean formulas, showcasing the power of mathematics in data interpretation.
Insightful Analyses: Beyond the raw numbers, we discussed potential deeper analyses and insights, from Fourier transformations to machine learning applications. This expanded our understanding of the myriad ways the data could be further explored.

In sum, our conversation underscored the multi-faceted nature of data analysis. From grappling with data quality issues to deriving meaningful insights, we experienced the full spectrum of challenges and rewards that come with data exploration. Through it all, the beauty of data analytics shone through: even amidst challenges, with the right tools and a curious mindset, data can unfurl its stories in surprising and enlightening ways.

From Numbers To Graphs – Wearable Data Visualization With Code Interpreter

Introduction

Table of Contents