Angular GitHub Commits

This is a real-world object-centric event log containing information on the commits in the GitHub repository of the Angular project.

Download the dataset here:

Overview

This real-world event log contains an extraction of the commit information from the GitHub repository used to developed the Angular platform. A single code commit in the repository is abstracted to one event in the log. The dataset contains essential information for each commit, such as the timestamp and the contributor’s details. Crucially, commit information is connected to two classes of objects: the file(s) affected by the commit, and the branch(es) in the repository containing the commit.

Description

GitHub, a popular platform for developers offering the functionalities of the Git versioning system, allows to record single modifications to software projects by contributors; such modifications are grouped in units called commits. Commits contain all details of the edits operated on a group of files in the projects. Therefore, all commits of a project constitute a ledger, that allows to rewind or fast-forward all contributions in the project.

Commits in a project are arranged in branches, which form a tree-like structure. A contributor may create a new branch, essentially a copy of the project, in order to commit modifications safely. Once the contributor is satisfied with the edits, they may merge their new branch back into the pre-existing branch (realized by applying the modifications of all the new commits sequentially, and then solving the conflicts that may arise).

This log contains an extraction of the commit information of the Angular project on GitHub. The abstraction level is such that every commit corresponds to an event in the log.

For each event, the following information is recorded:

  • a unique identifier (hash)
  • the author’s timestamp of the commit (includes timezone information)
  • an activity label: the Angular project conforms to the Conventional Commits initiative, which mandates commit messages containing an initial identifier. This helps to reconstruct a clean activity notion. Some of the labels have been cleaned by hand (for instance, in case of typos)
  • the message of the commit
  • the contributor’s name
  • the contributor’s email (resource)
  • a merge flag; True if the commit is a merge, False otherwise
  • information related to the files edited by the commit (in case of renames, we track the new name)
  • information related to the branches in which the commit appears

Files and branches are two distinct object types in this log. Note that a commit might not be associated to any file. Conversely, a commit always appears in at least one branch.

This event log has been extracted with the help of PyDriller.

Properties

This event log has the following properties:

Property Value
Events 27847
Activity Labels 67
Object Types 2
Objects (files) 35392
Objects (branches) 119

Get started

Download the dataset from Zenodo, and position it in the folder of your Python script or console.

pip install pm4py

To manipulate object-centric logs programmatically, use the functionality of the ocel package in the PM4Py library. Additionally, check out the tool support for object-centric event logs!

from pm4py import ocel

Authors

Marco Pegoraro

Contributing

To contribute, drop us an email! We are happy to receive your feedback.