Technical University of Denmark
Browse

The Kidmose CANid Dataset (KCID)

Version 2 2025-11-03, 07:48
Version 1 2025-10-30, 12:12
dataset
posted on 2025-11-03, 07:48 authored by Brooke Elizabeth KidmoseBrooke Elizabeth Kidmose, Andreas Brasen KidmoseAndreas Brasen Kidmose
<h2><b>Kidmose CANid Dataset (KCID)</b></h2><p dir="ltr"><br></p><p dir="ltr">The <b>Kidmose CANid Dataset (KCID)</b> contains CAN bus data collected by Brooke and Andreas <b>Kidmose</b> from 16 different drivers across 4 different vehicles. This dataset is designed to support driver identification and authentication research.</p><p dir="ltr">The term "CANid" reflects the dataset's dual purpose: data collected from the <b>CAN</b> bus for driver <b>id</b>entification research.</p><h2>VEHICLES</h2><p dir="ltr">The dataset includes data from four different vehicles across various manufacturers and model years:</p><ul><li><b>2011 Chevrolet Traverse</b> - 5-door full-size SUV crossover, AWD, 8 drivers (8 unique drivers in single-driver traces; 1 additional driver in a mixed trace)</li><li><b>2017 Ford Focus</b> - 5-door compact station wagon, FWD, 4 drivers</li><li><b>2017 Subaru Forester</b> - 5-door compact SUV crossover, AWD, 6 drivers (6 unique drivers in single-driver traces; 3 additional drivers in mixed traces)</li><li><b>2022 Honda CR-V Touring</b> - 5-door compact SUV crossover, AWD, 1 driver</li></ul><p dir="ltr"><i>Note:</i> The number of drivers includes volunteer drivers whose data was captured in single-driver traces, where we know who was driving at all times. We exclude volunteer drivers whose data is only available in mixed traces because we do not know when each specific driver was actually operating the vehicle.</p><h2>DRIVERS</h2><p dir="ltr">The dataset includes 16 drivers across different demographic categories:</p><p dir="ltr"><b>Male Drivers:</b></p><ul><li><b>Under 30 years:</b> 4 drivers ("male-under30-1" through "male-under30-4")</li><li><b>30-55 years:</b> 4 drivers ("male-30-55-1" through "male-30-55-4")</li><li><b>Over 55 years:</b> 3 drivers ("male-over55-1" through "male-over55-3")</li></ul><p dir="ltr"><b>Female Drivers:</b></p><ul><li><b>All ages:</b> 5 drivers ("female-all-ages-1" through "female-all-ages-5")</li></ul><p dir="ltr"><i>Driver Directory Structure:</i> Driver identifiers are used as directory/folder names. Within each directory, you will find traces collected from that particular driver, with additional information (location, data collection method, etc.) specified in the filename.</p><p dir="ltr"><i>Note:</i> We use "unknown driver(s)" in directory names when we know that one or more volunteer drivers was operating the vehicle, but we cannot identify who was driving or when. We used a standalone data logger for some data collection sessions. If we failed to download the data and clear the logger's memory before switching drivers, this resulted in mixed traces and, occasionally, "unknown driver(s)" entries. Unfortunately, some of our volunteer drivers were short-term visitors, so we did not have the opportunity to redo their traces as single-driver traces.</p><h2>LOCATIONS</h2><p dir="ltr">Data collection took place across multiple locations:</p><ul><li><b>DK</b> - Denmark</li><li><b>USA</b> - United States of America</li><li><ul><li><b>FL</b> - Florida</li><li><b>NE</b> - Nebraska</li><li><b>NE-to-FL</b> - Trip from Nebraska to Florida</li><li><b>TN</b> - Tennessee</li><li><b>TN-to-NE</b> - Trip from Tennessee to Nebraska</li></ul></li></ul><p dir="ltr">Location codes appear in filenames (e.g., <i>USA-FL-CANEdge-00000001.mf4</i> indicates data collected in Florida, USA).</p><h2>DATA COLLECTION METHODS</h2><p dir="ltr">Three different data collection methods were employed:</p><ul><li><b>CANEdge</b> - CSS Electronics CANEdge2: Standalone data logger that connects to the OBD-II port and logs to an SD card</li><li><b>Korlan</b> - Korlan USB2CAN: CAN-to-USB cable connecting the vehicle's OBD-II port to a laptop</li><li><b>Kvaser</b> - Kvaser Hybrid CAN-LIN: CAN-to-USB cable connecting the vehicle's OBD-II port to a laptop</li></ul><p dir="ltr">The data collection method is indicated in filenames (e.g., <i>USA-FL-CANEdge-00000001.mf4</i>).</p><h2>FILE TYPES</h2><p dir="ltr">The dataset provides data in three formats to support different use cases:</p><p dir="ltr"><b>.mf4 (MDF4) Format:</b> Measurement Data Format version 4 (MDF4)</p><ul><li>Binary format standardized by the Association for Standardization of Automation (ASAM)</li><li><b>Advantages:</b> Compact size, popular with automotive/CAN tools</li><li><b>Use case:</b> Native format from CSS Electronics CANEdge2</li><li><b>Reference:</b> <a href="https://www.csselectronics.com/pages/mf4-mdf4-measurement-data-format" rel="noreferrer" target="_blank">https://www.csselectronics.com/pages/mf4-mdf4-measurement-data-format</a></li></ul><p dir="ltr"><b>.log Format:</b> Text-based log format</p><ul><li><b>Compatibility:</b> Linux SocketCAN can-utils</li><li><b>Advantages:</b> Compatibility with SocketCAN can-utils; if a .log file is replayed, then data can be captured and monitored using Python's python-can library</li><li><b>References:</b> <a href="https://github.com/linux-can/can-utils" rel="noreferrer" target="_blank">https://github.com/linux-can/can-utils</a>, <a href="https://packages.debian.org/sid/can-utils" rel="noreferrer" target="_blank">https://packages.debian.org/sid/can-utils</a>, <a href="https://python-can.readthedocs.io/en/stable/" rel="noreferrer" target="_blank">https://python-can.readthedocs.io/en/stable/</a></li></ul><p dir="ltr"><b>.csv Format:</b> Text-based comma-separated values (CSV) format</p><ul><li><b>Advantages:</b> Easy to load with Python using the pandas library; easy to use with Python-based machine learning frameworks (e.g., scikit-learn, Keras, TensorFlow, PyTorch)</li><li><b>Usage:</b> Load with Python pandas: pd.read_csv()</li><li><b>Reference:</b> <a href="https://pandas.pydata.org/" rel="noreferrer" target="_blank">https://pandas.pydata.org/</a></li></ul><h2>SPECIALIZED EXPERIMENTS</h2><p dir="ltr">The KCID Dataset includes five specialized experiments:</p><p dir="ltr"><b>Fixed Routes Experiment</b></p><ul><li><b>Vehicles:</b> 2011 Chevrolet Traverse, 2017 Subaru Forester</li><li><b>Drivers:</b> male-30-55-3, male-30-55-4, male-over55-1, female-all-ages-1, female-all-ages-2, female-all-ages-5</li><li><b>Location:</b> Florida, USA (specific routes)</li><li><b>Data Collection Methods:</b> CSS Electronics CANEdge2, Kvaser Hybrid CAN-LIN</li><li><b>Purpose:</b> Capture CAN traces for specific, mappable routes; eliminate route-based variations in driver authentication data (e.g., low-speed local routes vs. high-speed long-distance routes)</li></ul><p dir="ltr"><b>OBD Requests and Responses Experiment</b></p><ul><li><b>Vehicle:</b> 2011 Chevrolet Traverse</li><li><b>Driver:</b> female-all-ages-5</li><li><b>Location:</b> Florida, USA</li><li><b>Data Collection Method:</b> CSS Electronics CANEdge2</li><li><b>Purpose:</b> Capture OBD requests and responses Arbitration IDs: <i>Requests:</i> 0x7DF, <i>Responses:</i> 0x7E8</li></ul><p dir="ltr"><b>Tire Pressure Experiment</b></p><ul><li><b>Vehicle:</b> 2011 Chevrolet Traverse</li><li><b>Driver:</b> female-all-ages-5</li><li><b>Location:</b> Florida, USA</li><li><b>Data Collection Method:</b> Kvaser Hybrid CAN-LIN</li><li><b>Purpose:</b> Capture normal and low tire pressure scenarios</li><li><b>Applications:</b> Detect tire pressure issues via CAN bus analysis; develop predictive maintenance strategies</li></ul><p dir="ltr"><b>Driving Modes and Features Experiment</b></p><ul><li><b>Vehicle:</b> 2017 Ford Focus</li><li><b>Driver:</b> male-30-55-1</li><li><b>Location:</b> Denmark</li><li><b>Data Collection Method:</b> Korlan USB2CAN</li><li><b>Purpose:</b> Capture different driving (and non-driving) modes and features</li><li><b>Examples:</b> gear (park, reverse, neutral, drive, sport); headlights on/off</li></ul><p dir="ltr"><b>Stationary Vehicles Experiment</b></p><ul><li><b>Vehicles:</b> 2024 Chevrolet Malibu, 2025 Toyota Corolla</li><li><b>Driver:</b> N/A (vehicles remained stationary)</li><li><b>Location:</b> Florida, USA</li><li><b>Data Collection Method:</b> Kvaser Hybrid CAN-LIN</li><li><b>Purpose:</b> Capture CAN bus traffic from very new, very modern vehicles; identify differences between an older vehicle's CAN bus (e.g., 2011 Chevrolet Traverse), and a newer vehicle's CAN bus (e.g., 2024 Chevrolet Malibu)</li></ul><h2>ADDITIONAL DOCUMENTATION</h2><p dir="ltr">Each "specialized experiment" directory contains a detailed README.md file with specific information about the experiment and the data collected.</p><h2>RESEARCH APPLICATIONS</h2><p dir="ltr">This dataset supports various research areas:</p><ul><li>Driver authentication, driver fingerprinting</li><li>Behavioral biometrics in the automotive domain</li><li>Vehicle diagnostics and predictive maintenance</li><li>Machine learning in the automotive domain</li><li>CAN bus analysis and reverse engineering</li></ul><h2>CITATION</h2><p dir="ltr">If you use the Kidmose CANid Dataset in your research, please cite appropriately. Citation information will be updated when our paper is published in a peer-reviewed venue.</p><p dir="ltr"><b>Article Citation:</b></p><p dir="ltr"><b>APA Style:</b> Kidmose, B. E., Kidmose, A. B., and Zou, C. C. (2025). A critical roadmap to driver authentication via CAN bus: Dataset review, introduction of the Kidmose CANid Dataset (KCID), and proof of concept. *arXiv*. https://arxiv.org/pdf/2510.25856</p><p dir="ltr"><b>MLA Style:</b> Kidmose, Brooke Elizabeth, Andreas Brasen Kidmose, and Cliff C. Zou. "A Critical Roadmap to Driver Authentication via CAN Bus: Dataset Review, Introduction of the Kidmose CANid Dataset (KCID), and Proof of Concept." *arXiv*, 2025. doi:10.48550/arXiv.2510.25856</p><p dir="ltr"><b>Chicago Style:</b> Kidmose, Brooke Elizabeth, Andreas Brasen Kidmose, and Cliff C. Zou. "A Critical Roadmap to Driver Authentication via CAN Bus: Dataset Review, Introduction of the Kidmose CANid Dataset (KCID), and Proof of Concept." *arXiv* (2025). doi:10.48550/arXiv.2510.25856</p><p dir="ltr"><b>Dataset Citation:</b></p><p dir="ltr"><b>APA Style:</b> Kidmose, B. E. and Kidmose, A. B. (2025). Kidmose CANid Dataset (KCID) v1. \[Data set\]. Technical University of Denmark. https://doi.org/10.11583/DTU.30483005.v1</p><p dir="ltr"><b>MLA Style:</b> Kidmose, Brooke Elizabeth, and Andreas Brasen Kidmose. "Kidmose CANid Dataset (KCID) v1." Technical University of Denmark, 30 Oct. 2025. Web. {Date accessed in dd mmm yyyy format}. doi:10.11583/DTU.30483005.v1</p><p dir="ltr"><b>Chicago Style:</b> Kidmose, Brooke Elizabeth, and Andreas Brasen Kidmose. 2025. "Kidmose CANid Dataset (KCID) v1." Technical University of Denmark. doi:10.11583/DTU.30483005.v1</p>

Funding

CyberQ Advancing cybersecurity with continuous variable quantum cryptography

Innovation Fund Denmark

Find out more...

History

ORCID for corresponding depositor