Share an interesting data analysis method

Share an interesting data analysis method

[[405125]]

This film note is a development summary, summarizing the examples of receiving and parsing GPS data, and sharing some thinking processes based on examples:

GPS Data Protocol

Most commonly used GPS modules use the NMEA-0183 protocol, which has now become the unified RTCM (Radio Technical Commission for Maritime services) standard protocol for GPS navigation devices.

NMEA-0183 is a standard specification designated by the National Marine Electronics Association of the United States. This standard establishes the communication standard between all marine electronic instruments, including the format of transmitted data and the communication protocol for transmitting data.

The protocol uses ASCII code to transmit GPS positioning information, which we call frames.

The frame format is as follows:

  1. $aaccc,ddd,ddd,…,ddd*hh(CR)(LF)

The types of GPS frame data are as follows:

In practical applications, not all data can be fully used. We can select the required data according to our needs.

Below we take $GPGGA data as an example to share the receiving and parsing methods.

The basic format of a $GPGGA statement is as follows:

  1. $GPGGA,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>*hh<CR><LF>

Here are some examples:

  1. $GPGGA,082006.000,3852.9276,N,11527.4283,E,1,08,1.0,20.6,M,,,,0000*35

GPS data reception

The GPS module uses serial communication, so it is necessary to receive data before parsing. I am doing the receiving under the embedded Linux platform, and the interface for reading the serial port is as follows:

  1. int uart_read(void *data, int data_len, long time_out);

Here are three receiving methods I use in practical applications:

Method 1: Rough method

In order to quickly verify data parsing and run through the entire process, we can first use a rough method to obtain data. In the rough method, we can first ignore the actual number of bytes in a frame of data and roughly set a buffer array for parsing, such as:

  1. char rx_gps_data[512];

The number of bytes read by uart_read each time is related to the thread suspension time. Roughly speaking, we can set up a serial port receive buffer array, such as:

  1. char uart_rx_buf[64];

At this time, you need to splice the contents of uart_rx_buf received each time, store them in rx_gps_data, and then parse them.

The rough method can be used to quickly verify data parsing and run through the entire process. The disadvantage is that if uart_rx_buf and rx_gps_data are not set reasonably, a large number of data frames may be destroyed.

Generally, I am more accustomed to quickly adjusting the entire process first, and then gradually optimizing it.

Method 2: State Machine Method

The rough method above may destroy some data frames. In addition, the code structure may not be clear enough. To improve these problems, use a state machine to receive. Receive byte by byte, and parse after receiving a complete frame of data.

Code such as:

  1. // All states of GGA (GGA data example: $GPGGA,023543.00,2308.28715,N,11322.09875,E,1,06,1.49,41.6,M,-5.3,M,,*7D)
  2. #define GGA_STATE_START 0 // $
  3. #define GGA_STATE_HEAD1_G 1 // G
  4. #define GGA_STATE_HEAD2_P 2 // P
  5. #define GGA_STATE_HEAD3_G 3 // G
  6. #define GGA_STATE_HEAD4_G 4 // G
  7. #define GGA_STATE_HEAD5_A 5 // A
  8. #define GGA_STATE_DATA 6 // ,023543.00,2308.28715,N,11322.09875,E,1,06,1.49,41.6,M,-5.3,M,,*
  9. #define GGA_STATE_CHECK0 7 // 7
  10. #define GGA_STATE_CHECK1 8 // D
  11. static uint16_t gga_len = 0;
  12. static uint8_t gga_state = GGA_STATE_START;
  13. static void gps_gga_data_get( char in_data)
  14. {
  15. switch (gga_state)
  16. {
  17. case GGA_STATE_START:
  18. if ( '$' == in_data)
  19. {
  20. gga_len = 0;
  21. memset(rx_gps_gga_data, 0, GGA_DATA_MAX_LEN);
  22. rx_gps_gga_data[gga_len++] = in_data;
  23. gga_state = GGA_STATE_HEAD1_G;
  24. }
  25. else
  26. {
  27. gga_state = GGA_STATE_START;
  28. }
  29. break;
  30. case GGA_STATE_HEAD1_G:
  31. if ( 'G' == in_data)
  32. {
  33. rx_gps_gga_data[gga_len++] = in_data;
  34. gga_state = GGA_STATE_HEAD2_P;
  35. }
  36. else
  37. {
  38. gga_state = GGA_STATE_START;
  39. }
  40. break;
  41. case GGA_STATE_HEAD2_P:
  42. if ( 'P' == in_data)
  43. {
  44. rx_gps_gga_data[gga_len++] = in_data;
  45. gga_state = GGA_STATE_HEAD3_G;
  46. }
  47. else
  48. {
  49. gga_state = GGA_STATE_START;
  50. }
  51. break;
  52. case GGA_STATE_HEAD3_G:
  53. if ( 'G' == in_data)
  54. {
  55. rx_gps_gga_data[gga_len++] = in_data;
  56. gga_state = GGA_STATE_HEAD4_G;
  57. }
  58. else
  59. {
  60. gga_state = GGA_STATE_START;
  61. }
  62. break;
  63. case GGA_STATE_HEAD4_G:
  64. if ( 'G' == in_data)
  65. {
  66. rx_gps_gga_data[gga_len++] = in_data;
  67. gga_state = GGA_STATE_HEAD5_A;
  68. }
  69. else
  70. {
  71. gga_state = GGA_STATE_START;
  72. }
  73. break;
  74. case GGA_STATE_HEAD5_A:
  75. if ( 'A' == in_data)
  76. {
  77. rx_gps_gga_data[gga_len++] = in_data;
  78. gga_state = GGA_STATE_DATA;
  79. }
  80. else
  81. {
  82. gga_state = GGA_STATE_START;
  83. }
  84. break;
  85. case GGA_STATE_DATA:
  86. if ( '*' == in_data)
  87. {
  88. rx_gps_gga_data[gga_len++] = in_data;
  89. gga_state = GGA_STATE_CHECK0;
  90. }
  91. else
  92. {
  93. rx_gps_gga_data[gga_len++] = in_data;
  94. if (gga_len > GGA_DATA_MAX_LEN)
  95. {
  96. gga_state = GGA_STATE_START;
  97. }
  98. else
  99. {
  100. gga_state = GGA_STATE_DATA;
  101. }
  102. }
  103. break;
  104. case GGA_STATE_CHECK0:
  105. rx_gps_gga_data[gga_len++] = in_data;
  106. gga_state = GGA_STATE_CHECK1;
  107. break;
  108. case GGA_STATE_CHECK1:
  109. rx_gps_gga_data[gga_len++] = in_data;
  110. printf( "gga data : %s\n" , rx_gps_gga_data);
  111. gga_state = GGA_STATE_START;
  112. break;
  113. default :
  114. break;
  115. }
  116. }

In this way, the gga data can be received completely. Every time the rx_gps_gga_data reaches the GGA_STATE_CHECK1 state, it is the complete gga data, and then it can be parsed. In this step, a flag variable can be set to indicate that the gga data has been completely received, and parsing will not be performed until the data is received.

Although this method can receive data relatively well and is very useful in single-chip microcomputers, under the same thread suspension time, each uart_read only obtains one byte, which will reduce the receiving efficiency to a certain extent, which is a bit like robbing Peter to pay Paul.

In our application, this conflicts with the timing requirements of the algorithm, so we can only think of other methods. Let's take a look at method 3.

Method 3: Timestamp method

This method requires to know what data each frame contains and how often the data is output. Under the same thread suspension time, first set the buffer used for uart_read to receive data a little larger, and see how many bytes of data can be read at most each time and how many times the serial port data needs to be read to complete a frame of data.

Then we can distinguish each frame of data and each packet of serial port data by time, and repackage them where necessary.

For example: the interval between each frame of data is 200ms, the thread suspension time is 10ms, one frame of data has 130 bytes, and one frame of data consists of 1 packet and 2 packets of serial port data.

The timestamp can be used to determine whether the interval between each packet is the interval between data frames or the interval between two data packets in each frame of data, and then corresponding logical processing can be performed to receive the data well.

GPS Data Analysis

How to parse GPS data?

There may be many methods. Let's first look at the analysis method of the punctual atom:

It can be roughly divided into two steps. The first step is to get the position of the comma to determine a field that needs to be parsed, and then convert the string data of the corresponding field into a number.

Here is a simple and practical analysis method. The idea is similar to the above, but it is relatively simpler and clearer:

  1. static bool gps_gga_data_parse(st_gps_gga_def *out_data, char *in_data)
  2. {
  3. bool ret = FALSE ;
  4. char *p_gga = in_data;
  5. if ( NULL == p_gga )
  6. {
  7. return ret;
  8. }
  9. if ( NULL != (p_gga = strstr(p_gga, "$GNGGA" )))
  10. {
  11. printf( "gga data : %s\n" , p_gga);
  12. /* Data verification */
  13. if ( TRUE == data_check(p_gga))
  14. {
  15. printf( "gga data check success!\n" );
  16. /* Parse the string */
  17. printf( "gga data parse: \n" );
  18. for ( int i = 0; i < GGA_STR_MAX; i++)
  19. {
  20. sscanf(p_gga, "%[^,]" , gps_gga_str[i]);
  21. printf( "%s\n" , gps_gga_str[i]);
  22. p_gga = p_gga + (strlen(gps_gga_str[i]) + 1);
  23. }
  24. /* Convert string to number */
  25. out_data->latitude = atof(gps_gga_str[STR_LATITUDE]);
  26. out_data->longitude = atof(gps_gga_str[STR_LONGITUDE]);
  27. out_data-> time = atof(gps_gga_str[STR_TIME]);
  28. out_data->quality = atof(gps_gga_str[STR_QUALITY]);
  29. ret = TRUE ;
  30. }
  31. else
  32. {
  33. printf( "gga data check error!\n" );
  34. }
  35. }
  36. return ret;
  37. }

Here we use sscanf + regular expression for parsing.

  1. sscanf(p_gga, "%[^,]" , gps_gga_str[i]);

The sscanf function is very useful for string-related parsing. It is used here with regular expressions. The above code means to take the data before the comma from p_gga and store it in gps_gga_str[i]. Because gga data are separated by commas, all the data can be parsed out by looping several times, which is very convenient.

Regular expression learning resources such as:

  1. 1. https://deerchao.cn/tutorials/regex/regex.htm
  2. 2. https://www.runoob.com/regexp/regexp-syntax.html

Let's take a look at some simple uses of sscanf + regular expressions:

"1. Get a string of specified length."

For example, in the following example, a character string with a maximum length of 4 bytes is taken.

  1. sscanf( "123456 " , "%4s" , str);

"2. Get the string up to the specified character."

As in the following example, the string ends when a space is encountered.

  1. sscanf( "123456 abcdedf" , "%[^ ]" , str);

「3. Get a string that contains only the specified character set.」

For example, in the following example, we take a string containing only the numbers 1 to 9 and lowercase letters.

  1. sscanf( "123456abcdedfBCDEF" , "%[1-9a-z]" , str);

"4. Get the string up to the specified character set."

For example, in the following example, the string is retrieved until an uppercase letter is encountered.

  1. scanf( "123456abcdedfBCDEF" , "%[^AZ]" , str);

The sscanf + simple, easy-to-understand regular expression method can sometimes help us parse string data very conveniently. sscanf + complex regular expressions are not recommended because the code readability is too poor.

In addition, it is necessary to write some comments when using sscanf+regular expressions. It is okay if you have seen this method before, but some people who read your code later may not have been exposed to regular expressions and may not understand it for a while.

When I was doing internship in my junior year, I saw such code in the company. At that time, my knowledge reserve was not enough. It was the first time I saw the parsing method of sscanf + regular expression, but I couldn't find the relevant answer when searching. So, it is necessary to write some comments in daily life, which is beneficial to others and yourself.

refer to:

1. "ATK-NEO-6M GPS Module" information from Zhengdian Atom.

2. https://blog.csdn.net/absurd/article/details/1177092

This article is reprinted from the WeChat public account "Embedded Mixed Potpourri", which can be followed through the following QR code. To reprint this article, please contact the Embedded Mixed Potpourri public account.

<<:  Only 91 base stations were built in two years. Why is 5G millimeter wave so difficult?

>>:  On the day of the Chinese college entrance examination, most of the Internet in the world was paralyzed by this "small company"

Recommend

Industry 4.0 is driving enterprise fiber access

Industry 4.0 has brought with it a wave of value-...

Operators are ready for the cloud computing market in 2019!

After nearly a decade of deployment in the cloud ...

Why can TFO reduce TCP to 0 handshakes?

1. Overview In the previous article, why TCP need...

5G Thinking丨Please give 5G some tolerance and time

[[400629]] Recently, 5G has become a hot topic on...

my country's network infrastructure already fully supports IPv6

At the 2020-2021 Global IPv6 Development and Outl...

Application of Passive WDM Technology in 5G Fronthaul

Labs Guide Passive WDM technology is the main tec...

How to set up a backup internet connection for your home office

For work from home, one factor that businesses mu...

F5: Hybrid cloud architecture behind the "Double Eleven" carnival

The total sales volume of the entire network reac...

Pairing private networks with 5G to boost smart city development

Consider the superior performance that 5G offers ...

Will Wi-Fi 7 be a revolution?

A Google search for “famous members of Generation...

What impact will the cancellation of data roaming have on telecom operators?

With the development of economy and society, the ...