Automated Discovery of Successful Strategies in Association Football

Muñoz, Omar; Monroy, Raúl; Cañete-Sifuentes, Leonardo; Ramirez-Marquez, Jose E.

doi:10.3390/app14041403

Open AccessArticle

Automated Discovery of Successful Strategies in Association Football

¹

Tecnologico de Monterrey, School of Engineering and Science, Atizapán de Zaragoza 52926, Estado de Mexico, Mexico

²

Stevens Institute of Technology, School of Systems & Enterprises, Hoboken, NJ 07030, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(4), 1403; https://doi.org/10.3390/app14041403

Submission received: 28 November 2023 / Revised: 23 January 2024 / Accepted: 30 January 2024 / Published: 8 February 2024

(This article belongs to the Collection Computer Science in Sport)

Download

Browse Figures

Versions Notes

Abstract

:

Using automated data analysis to understand what makes a play successful in football can enable teams to make data-driven decisions that may enhance their performance throughout the season. Analyzing different types of plays (e.g., corner, penalty, free kicks) requires different considerations. This work focuses on the analysis of corner kick plays. However, the central ideas apply to analyzing all types of plays. While prior analyses (univariate, bivariate, multivariate) have explored the link between contextual factors (e.g., match period, type of defensive marking) and the level of success of a corner kick (e.g., shot, shot on goal, goal), there has been no attempt to combine spatiotemporal event data (sequences of ball movements through the field) and contextual information to determine when and how (strategy) a particular type of corner kick play (tactic) is more likely to succeed or not. To address this gap, we propose an approach that (1) transforms spatiotemporal data into an alternative representation suitable for mining sequential patterns, (2) identifies and characterizes the sequential patterns used by offensive teams to move the ball toward the scoring zone (tactics), and (3) extracts contrast patterns to identify under what conditions different tactics result in increased chances of success or failure; we call these conditions strategies. Our results suggest that favorable and unfavorable conditions for tactic application are not the same across different tactics, supporting the argument that there is a benefit in performing an analysis that treats different tactics separately, where spatiotemporal information plays a crucial role. Unlike prior works on the corner kick, our approach can capture how the interaction between multiple contextual factors impacts the outcome of a corner kick. At the same time, the results can be explained to others in natural languages.

Keywords:

advanced artificial intelligence; corner kick; data mining; football; soccer; set piece; sports analytics; strategy discovery

1. Introduction

Until recently, automated association football (henceforth called football, for short) analysis was limited by the quantity and relevance of the available data, which required researchers to be responsible for both the collection and analysis of relevant data. Fortunately, growing interest in sports analytics has led to the development of more advanced data collection techniques [1]. In football, we are moving from simple match sheet data with high-level information such as line-ups, substitutions, goals, and cards to other sources of information such as optical tracking and event stream data [2]. Event stream data, also known as soccer logs, provide descriptions of all on-the-ball actions performed by players (e.g., passes, shots, and fouls), along with their location in the field [2]. Unlike previous sources of data, event stream data are a source of spatiotemporal information; the events in the log are ordered based on their occurrence in time while simultaneously tracking the spatial location of each event in the field. Event stream data are becoming increasingly available for research purposes (e.g., [3,4,5]), opening the possibility for further studies on data-driven decision making.

In general, a strategy can be understood as a plan or series of activities designed to achieve a major or overall aim [6,7,8]. In football, the study of strategies can be separated by the type of play under study (e.g., corner kick, penalty kick, free kick, open play). The core ideas we use to analyze strategies in this work apply to all types of plays within football. As we shall see, context plays a vital role in defining strategies. Therefore, a strategy to execute a corner kick will be inherently different from the strategy to execute a free kick, and their automated analysis will require different considerations. For this reason, in this work, we focus on the study of strategies for the corner kick use case. Corner kicks are one of the most critical plays in football. Numerous studies have estimated that set plays (e.g., corner kick and free kick) account for approximately 30% to 40% of goals [9,10,11,12], with corner kicks being the most effective among them [10]. Although the conversion rate of corner kicks is low, at approximately 2.1% [12], it remains relatively high compared to other types of plays, such as open plays, which have a conversion rate of approximately 1.1% [12].

1.1. Related Work

When studying strategies for the corner kick, a common and simple approach is to conduct various types of univariate [10,13,14,15,16] or bivariate [10,16,17,18,19,20] analyses. For example, using a chi-squared test of independence, researchers have looked for significant associations between various contextual factors (e.g., clock time, origin corner, and the number of attackers) and the level of success of a corner kick (e.g., shot, shot on goal, and goal). However, these types of analyses have been criticized for analyzing different elements of the game in isolation [21,22,23], failing to capture the underlying complexity of the game or the effect of confounding variables on the outcome of a play [22]. One widely proposed solution to address this concern is to perform a joint analysis of different contextual factors [23]. Therefore, more recently, multivariate analyses (e.g., logistic regression) have emerged [10,16], enabling the modeling of interactions between contextual factors and their relationship with corner kick success. However, none of these approaches provide an understanding of how teams move the ball through the field. Nevertheless, as suggested in previous works [2,24,25], particular sequences of ball movements can result in an advantage opportunity. Therefore, failing to integrate these sequences into the analysis provides a limited perspective on successful strategies. This limitation can be partly explained by the lack of integration of spatiotemporal information for most research on corner kicks. Furthermore, approaches such as logistic regression models may be comprehensible to individuals with a strong statistical background but can pose difficulties for those without such expertise [26]. In general, the ability to create explainable strategy models that can be easily used by coaches and practitioners is an important element in the data-driven analysis of the game [27].

From a broader perspective, beyond the corner kick, there are roughly three approaches to carrying out the analysis [2]. The first approach is to summarize a team’s playing style using several features and then cluster teams based on those features [24,28]. The second approach involves investigating different pattern mining algorithms [25,29,30,31,32,33,34,35,36]. Finally, the third approach attempts to model the complete behavior of the team in a network-based approach, such as a passing network or a Markov network [23,37,38,39]. It is worth noting that while some works beyond the corner kick have studied sequential information from spatiotemporal data (e.g., [2,24,25,30]), these other works usually disregard the influence of context [2]. The difference in approaches between the analysis of strategies within and beyond the corner kick highlights what has been described as a lack of conceptual connection between different works [40], an issue that, as implied in previous works [23,40], likely stems from a lack of clear definitions for both tactics and strategies [23,40], making it challenging for researchers to define what their work should aim to find, and to compare their findings with others. Furthermore, it suggests that following a top-down approach in which definitions for tactics and strategies are given upfront, as opposed to implicitly defining them from the proposed analytical framework (bottom-up), could help bring more clarity to the field.

1.2. Contributions and Outline

Although several schemes have been proposed to analyze corner kick strategies in a data-driven manner, there have been no efforts to combine spatiotemporal and contextual information into a single analytical framework to understand how particular sequences of ball movements and the context in which they are executed influence the outcome of a corner kick. Motivated by the identified gaps, our work takes a top-down approach to study strategies for the corner kick using a joint analysis of spatiotemporal and contextual information in a language closer to the actors of the game. We aim to intuitively describe the elements of successful plays to support decision making for the corner kick use case.

A summary of our contributions is as follows:

An alternative representation of the field designed to facilitate the analysis of corner kick plays, regardless of the side of the field where the play is executed.
The identification and characterization of recurrent sequences of events across multiple corner kick executions used by offensive teams to move the ball toward the scoring zone.
The identification and characterization of favorable and unfavorable conditions for the application of such sequences using terminology that can be easily explained to others in natural language.

Overall, our approach provides a new perspective for analyzing corner kick plays in a way in which results can be easily communicated to practitioners, and in which sequences of ball movements are integrated into the analysis.

The remainder of this paper is organized as follows. Section 2 presents the materials and methods used in this study. This section is divided into Section 2.1, which describes how to prepare event data to enable finding recurrent sequences of ball movements in corner kick plays, Section 2.2 describes how to employ Sequitur to uncover such recurrent game sequences, and Section 2.3 describes how to use PBC4cip to identify the conditions that anticipate the success (or not) of specific game sequences. Section 3 presents our results; Section 3.1 focuses on recurrent game sequences, and Section 3.2 focuses on the contextual conditions that anticipate their success (or not). Finally, Section 4 and Section 5 present the discussion and future work.

2. Materials and Methods

Football data are usually owned by specialized companies and are rarely publicly available for scientific research [3]. To date, the event log from [3] is the largest open collection of football logs ever released. It provides the main on-the-ball actions during each match for an entire season of seven competitions. In total, the event data set captures 1941 matches and 3,251,294 events. In addition to the event log, the authors provide six other data sets corresponding to information about the competitions, matches, teams, players, referees, and coaches, which provide a rich source of contextual information about the events in the event log. The size, quality, and type of information provided in these data sets make them suitable for our analysis.

While different data vendors provide event stream data in different formats, an event typically contains information about what action was taken, who performed the action, when the action was taken, and where on the field it was taken [23] (in our case both the initial and final position of the event are captured). The log captures ten different events: Pass, Duel, Others on the ball, Free kick, Interruption, Shot, Foul, Save attempt, Offside, and Goalkeeper leaving line. Passes are the most frequent events, accounting for approximately 51% of the total events. Duels are the second most frequent events (∼27%). The remaining events each account for less than 8% of the log. The log contains 19,316 corner kicks. On average, there are roughly ten corner kick executions per match (19,316/1941). For a complete description of possible events, subevents, tags, and other metadata refer to [3].

2.1. Preparing and Representing Corner Kick Event Data

Event stream data serves multiple purposes, including informing broadcasters, and is not specifically tailored for data analysis [2]. One problem with using event stream data for analyzing corner kicks is that the log does not explicitly indicate all the events associated with a particular corner kick execution. We show how dividing the log into sequences of related events, which we call plays, allows us to focus our analysis on events associated with different corner kick executions. Next, using plays as the unit of analysis, we define tactics as recurrent sequences of events across multiple plays used by offensive teams to move the ball toward the scoring zone. Building on top of tactics, we define strategies as the contextual conditions (e.g., match period, player height) that anticipate the success of a particular tactic.

Discovering such tactics requires mining frequent subsequences in a set of sequences (plays), a problem for which algorithms usually require sequences to be represented as lists of symbols. Therefore, changing the representation of plays to sequences of symbols (representing key events) is required to enable tactical discovery, we call this new representation symbolic plays. However, the problem with converting plays into symbolic plays is that the play’s events contain much information, that needs to be condensed into a single symbol. This symbol should convey enough meaning so that the tactics discovered using this representation are intuitive and informative [2]; the ability to explain the findings to coaches and practitioners is relevant for generating an impact on the decision-making process in the field [27]. We show that by removing certain information from the plays, discretizing the event’s location, and establishing a common frame of reference across multiple corner kick executions (regardless of their initial position), we can create a compact and informative alphabet for constructing our symbolic plays.

2.1.1. Play Extraction and Preprocessing

Similar to previous works [2,25], we divide the log into sequences of related events, which we call plays. However, our focus is on corner kick plays; we define a corner kick play begins with the occurrence of a Free kick event whose subevent type is Corner. On the other hand, a corner kick ends upon the occurrence of either of the following four conditions: (1) the occurrence of a terminal event (Foul, Clearance, Interruption, Offside); (2) a significant period of inactivity (for consistency with [2], a time delta between contiguous events of more than 10 s); (3) a change in ball possession; or (4) when the ball leaves the field. Furthermore, we observed that the data set can contain invalid position information (e.g., the same initial and final positions for certain events). Therefore, plays where events contain invalid position information are removed from our data set.

Given that our tactics focus on how the ball moves toward the scoring zone, we remove all non-pass events (Duel, Others on the ball, Free kick, Interruption, Shot, Foul, Save attempt, Offside, Goalkeeper leaving line) from our plays, effectively converting them into pass sequences. Nevertheless, the occurrence of the removed events in the plays is stored as play metadata. Some of this metadata (e.g., number of duels) will be used as contextual information, whose impact on the outcome of a play is considered as part of the analysis described in Section 2.3. Recall that, initially, the first event of our plays was a Free kick event whose subevent type was Corner. This event can essentially be considered a type of pass. Therefore, before removing all Free kick events, Free kicks whose subevent type is Corner are changed into Pass events.

Additionally, for two contiguous pass events in a pass sequence, the second pass either begins at the same location where the first pass ends, or begins at a different location (e.g., as a result of the receiving player running with the ball or a loose ball). These two possibilities are illustrated in Figure 1 which, for ease of reference, shows a non-corner kick play. Notice that in the first scenario (Figure 1a), all passes in the sequence are taken from the final position of the preceding pass; describing at all times how the ball moves toward its final destination. However, in the second scenario (Figure 1b), the pass sequence fails to describe how the ball moves from the final position of the first pass (

f p_{1}

) to the initial position of the second pass (

i p_{2}

). In general, we refer to this scenario as a position mismatch, and it is caused by an implicit movement of the ball.

Implicit ball movements in our plays prevent us from creating uninterrupted sequences of events that describe how the ball travels toward the scoring zone. Tactics extracted from such sequences will not be able to describe the step-by-step movement of the ball. Therefore, we capture the implicit movements of the ball by introducing a synthetic Ball movement event whenever a position mismatch is detected. A typical player sprint is on average 14 m long [41]; we define a position mismatch occurs whenever the Euclidean distance between (

f p_{i}

) and (

i p_{i + 1}

) is greater than ten units, which roughly matches this distance. We assign the initial position of the synthetic event to be equal to (

f p_{i}

) and its final position to be equal to (

i p_{i + 1}

). Figure 2 shows how this event is added to address the position mismatch shown in Figure 1b. Due to the high level of redundancy between the initial and final positions in the enhanced sequence, we only keep the final position information.

2.1.2. Abstract Representation for Corner Kick Plays

In this section, we create an abstract representation of corner kick plays. Notably, our representation retains only the play’s crucial information for our tactical discovery—the location, name, and sequential order of each event—while other details, such as the player executing the action or the match period, are either disregarded or stored as play metadata.

A crucial component of our representation is the use of a unified frame of reference to represent corner kick plays. Note that a corner kick can be taken from any of the four corners of the field. However, the data set we use measures an event’s position relative to the offensive team. The X-coordinate value indicates the nearness of the event (in percentage) to the opponent’s goal. In contrast, the Y-coordinate value indicates the nearness of the event (in percentage) to the right side of the field. While technically, there are four possible origin locations for our plays, when measured from the offensive team’s point of view, there are only two: a left-side corner, starting from

(x = 100, y = 0)

, and a right-side corner, starting from

(x = 100, y = 100)

.

Under this convention, and due to the different orientations, our analysis needs to be separated into two parts: left-side corners and right-side corners. However, one can consider that left and right-sided corner kicks are mirrored versions of each other, and therefore, they are, to some extent, equivalent. By placing all the corner kicks into a unified frame of reference, we are left with a single type of corner kick, avoiding the need to separate the analysis into two parts and increasing the number of corner kick instances available during the analysis. In our new frame of reference, we establish that all corner kicks start at

(x = 100, y = 100)

. Figure 3 shows an example of this transformation.

Note that after the mirroring operation, it is not possible to determine the initial side of the corner kick. Previous works [10] have suggested that the side of the field from which the corner kick originates could influence its outcome. Therefore, before executing the mirroring operation, we save the side (left or right) from which the corner kick originates as part of the play metadata.

Another crucial element of our representation is the spatial abstraction of events. Unlike other approaches [25,42], which use a grid to divide the field into symmetric regions, our regions are proposed using, to some extent, domain knowledge. Figure 4 shows our representation of the field together with the names of the regions. In this new representation, the field is divided into 12 unique regions. Notice that the size of our regions drives a trade-off between the expressiveness of our plays and our ability to detect similarities between them. If the regions are too large, the events in the plays are likely to fall into the same region, making it easier for us to compare them. However, the plays will have little expressive power, meaning that we cannot tell what happened during the game with much detail. On the other hand, if the regions are too small, we may obtain very expressive plays; however, because of the increased number of spatial locations, it will be harder to detect similarities between them.

By creating a histogram of the final positions (destination regions) of the events in our plays, we can validate our assumption that most events in our plays will be targeted toward the penalty and goal areas (accumulated frequency of 58.37%). Notably, region Penalty Middle (PM) is the most frequent destination (around 25% of the events), which is reasonable considering that this region is immediately in front of the opponent’s goal. While region Backfield (B) is the second most frequent destination region (around 17% of events), region Left flank (L) has a similar frequency but covers a much smaller area. These observations suggest that our domain-knowledge-based division helps focus on regions that can be relevant from the perspective of the game’s actors.

2.1.3. Symbolic Representation for Corner Kick Plays

At this stage, our intermediate representation of plays can be described by sequences of two-element tuples

(e t, d r)

, where:

$e t$ is the set of all possible event types in our plays:

$\begin{matrix} e t = {Pass, Ball movement}; \end{matrix}$
$d r$ is the set of all possible destination regions:

$\begin{matrix} d r = {B, L, R, R L, R M, R R, P L, \\ P M, P R, 1 P, G M, 2 P} \end{matrix}$

For example, the corner kick execution shown in Figure 3b can be represented by the following sequence:

\begin{matrix} [(P a s s, P L), (P a s s, L), (P a s s, B), (P a s s, P M)] \end{matrix}

The intermediate representation can be converted into a sequence of symbols by mapping each possible

(e t, d r)

pair to a unique character. The set of all possible characters creates an alphabet which we use to construct symbolic plays. Considering two possible events and twelve possible destination regions, the cardinality of our alphabet is

c = 24

. However, considering the Ball movement event is rare (∼8% of the events in our plays), we decided not to include its position information during the mapping; performing this step allows us to approximately halve the cardinality of our alphabet, leading to a more compact representation and making it more likely to find frequent subsequences. Thus, in the end, our alphabet has thirteen characters (

c = 13

)—twelve for all Pass events with position information, and one for all Ball movement events. The mapping between our intermediate representation and the characters in our alphabet is shown in Figure 5.

2.2. Discovery of Corner Kick Tactics

In our work, a tactic refers to a recurrent sequence of events in our plays. Importantly, events have an inherent hierarchical structure; we can group events to represent more complex events. For example, as described in [43], entering a room might require unlocking the door, opening the door, and walking through the door. We show how, by expressing plays in terms of grammar, we can not only discover recurrent sequences of events in our plays but also establish hierarchical relationships between them. This way, certain rules in our grammar (e.g., rules composed of other rules) will help us express frequent high-level behaviors that we consider valuable within this domain.

2.2.1. Algorithm Selection

Previous works [43,44] have suggested that when working with event data, we should strive for our models to capture the event’s structural relationships, i.e., to describe how we can aggregate low-level events into higher-level events or behaviors. From the perspective of our work, frequent high-level behaviors can be more interesting than frequent behaviors on their own. Moreover, being able to describe plays in terms of high-level behaviors can help us express them more intuitively.

Therefore, we want an algorithm that allows us to meet three objectives, each corresponding to a desirable characteristic of our tactics:

To find recurrent sequences of events.
To establish a hierarchy between recurrent sequences of events to detect high-level behaviors.
To express corner kick plays in terms of high-level behaviors.

We chose Sequitur [45], an algorithm designed for grammar induction and compression, to achieve our objectives. As suggested in previous works [46,47,48,49], grammar induction shares many similarities with traditional sequence mining; this process can also be used to find recurrent sub-sequences in sequence data, allowing us to tackle similar problems. However, unlike traditional sequence mining algorithms, the inherent structure of a grammar mirrors the underlying structure of events, making it possible to establish hierarchical relationships between recurrent sequences of events, allowing us to detect high-level behaviors and describe the plays in terms of such behaviors. In addition, the capability of Sequitur to run in linear time makes it a good choice for analyzing large data sets.

2.2.2. Discovery of Tactics with Sequitur

We can roughly distinguish between two main types of corner kicks: direct and indirect [16]. Direct corner kicks involve just one touch to reach the scoring zone before the defining outcome occurs, so we give them no further consideration. On the other hand, indirect corner kicks require two or more touches to reach the scoring zone before the defining outcome; we employ Sequitur to discover tactics within the indirect corner kick plays only. The relevance of indirect corner kick executions in the data set is worth noting. We observed that from the 17,773 valid corner kick executions in the data set, direct corner kick executions (11,904) succeed 22.5% of the time, while indirect corner kicks (5896) succeed 29.5% of the time, where we have defined a successful corner kick as one that leads to either a shot or a goal and does not result in an offside call.

Notice that, in its original form, Sequitur is designed to derive grammar from a single sequence of symbols. However, our analysis is not constrained to detecting repetition within a single game sequence (play); we aim to find recurrent sequences of events across multiple plays. For this reason, we used the public implementation of the Sequitur algorithm in [50] which has already been adapted to handle multiple sequences. To this end, a single sequence needs to be generated from shorter sequences by adding a special symbol “|” in between, and then running the algorithm. By design, this special symbol cannot be made part of any rule, and thus it disallows finding recurrent sub-sequences that involve parts of two different sequences.

We can construct the input sequence for Sequitur by concatenating the symbolic representation of plays into a string. For example, the string “ABC|ABC|ABD|ABD” represents four different corner kick executions. Table 1 shows the output of the Sequitur algorithm for this input sequence.

First, notice that, after processing, the input sequence is embodied in the start rule of the grammar (R0). This rule is composed of other rules, and it can be expanded by recursively unfolding the other rules until the entire input sequence is reproduced. Additionally, we can distinguish between rules included in the definition of R0 (such as R1 and R2 in our example) and those that are not (R3 in our example). In our work, rules appearing in the definition of R0 are considered relevant for their ability to express plays in a more concise way; we say that these rules are able to express high-level behaviors and associate them with the concept of tactics.

In general, tactics can be composed of other grammatical rules and symbols in the alphabet of the input sequence. We can describe the meaning of tactics by recursively unfolding the rules within a tactic and translating the symbols back into the events they represent. The recursive unfolding of R1 produces the sequence A B C. Using the mapping from the previous section (Figure 5), we uncover this sequence describes three consecutive passes through regions Backfield (B), Left flank (L), and Rebound left (RL). For simplicity, we express our tactics using a string that combines the event type and destination region after unfolding all nested rules. Tactic R1 is expressed by the string: “Pass backfield, Pass left flank, Pass Rebound left”.

2.2.3. Identifying Relevant Tactics

We define the frequency, length, and success rate of a tactic to highlight relevant tactics among the output of the algorithm. We can uncover the most common tactics based on their frequency. Equation (1) represents the frequency of a tactic t.

\begin{matrix} f (t) = Number of occurrences of tactic t in grammar rule R 0 \end{matrix}

(1)

Equation (2) represents the length of a tactic t. Longer tactics result from the aggregation of more events than shorter tactics do. Therefore, they can describe more complex behaviors which could be of interest.

\begin{matrix} l (t) = Number of symbols in the recursive unfolding of tactic t \end{matrix}

(2)

Furthermore, we define that an offensive attempt is successful whenever a play leads to either a shot or a goal and does not result in an offside call. Equation (3) represents the success rate of a tactic t.

s (t) = \frac{Number of successful plays where tactic t occurs}{f (t)}

(3)

Practitioners may be most interested in tactics with high success rates. Nevertheless, directly comparing their success rates might lead to misleading conclusions because not all tactics will be used in the same proportion. Therefore, we can also look for statistically significant associations between the use (or not) of a tactic and its success (or not) using a chi-squared test with a significance level of 0.05. To deal with the multiple comparison problem (one hypothesis testing per tactic), we can apply the Bonferroni correction to our significance level (

0.05 / N

), where N is the number of tactics obtained from our corner kick play data set.

2.2.4. Play Compression

After applying Sequitur to the string representing our plays, the input sequence is embodied in the start rule of the grammar (R0). Individual plays can be retrieved by splitting R0 back into multiple sequences, each representing one corner kick execution in terms of other grammar rules (tactics). Since grammar rules consist of two or more symbols, plays represented by rules use a lower number of symbols than the original sequences. The compression factor of a play can be computed by dividing the number of symbols in the input play by the number of symbols in the output play. Larger average compression factors may be considered a good indicator of the ability of the grammar to find a more concise representation of the plays.

2.3. Discovery of Corner Kick Strategies

In our work, a strategy refers to the contextual conditions (e.g., match period and player height) that anticipate the success of a particular tactic. Identifying such conditions can help practitioners identify appropriate conditions for tactic application. However, we can also gain insights from the conditions that anticipate the failure of a tactic; practitioners could obtain insights as to when and why certain tactics should not be taken.

A common saying in football states that a tactic is only valuable if supported by a strong technique [40,51,52,53]. In addition to skill, other factors, such as physiological, environmental, and player- or team-specific conditions, can also affect the outcome of a tactic. However, not all of these factors are typically captured by existing public data sets (e.g., the player’s emotional state and physical fatigue). Therefore, our selection of contextual factors is limited to the information available within the data sets considered. Figure 6 shows the contextual factors used in this work, extracted from the metadata of the events used in our plays and from complementary data sets from [3,54]. Our selection of contextual factors is based on variables considered of interest in previous work, domain knowledge, and, more importantly, their availability. Notice that we have grouped the attributes according to the type of information they capture. A complete description of the attributes shown in Figure 6 is presented in Appendix A.

We now describe how to find differences between successful and failed plays that use a particular tactic through contrast pattern mining. In this context, a pattern is an expression defined in a certain language that describes a collection of objects [55,56]. Usually represented by a conjunction of relational statements, each of the form

[f_{i} # v_{j}]

, where

v_{j}

is a value in the domain of feature

f_{i}

and # is a relational operator from the set

{\in, \notin, =, \neq, \leq, >}

[55,57]. For example,

[origin corner = R] \land [play duration > 10 \sec]

is a pattern that describes plays starting from the right side of the field, whose duration is greater than ten seconds. Each pattern has a corresponding support which indicates the proportion of objects that meet the description of the pattern in a class [58]. Accordingly, a contrast pattern is a pattern appearing significantly more in one class with respect to the remaining classes [58,59]. It is worth noting that pattern-mining algorithms often produce hundreds or thousands of patterns; we show how we can perform a selection of patterns to infer the most favorable and unfavorable conditions for tactic application.

2.3.1. Algorithm Selection

Contrast pattern mining algorithms can be broadly categorized into exhaustive-search-based (ESB) algorithms, which execute an exhaustive search of a combination of values for features that are significant in one class in comparison with other classes, and decision-tree-based (DTB) algorithms, which extract contrast patterns from a collection of decision trees [57]. The main drawback of ESB algorithms is that they usually start with an independent a priori discretization of all numeric features [57]; discretizing a numerical attribute without considering the values of other features could hide relations in the objects of a class, causing an important information loss [57,60,61]. Furthermore, mining contrast patterns using ESB algorithms is a challenging problem because of the high computational cost due to the exponential number of candidate patterns [57]. Another drawback of ESB algorithms is that they only extract patterns having items in the form

[f_{i} = v_{j}]

; substracting patterns’ discriminative power by not using other relational operators, like

{\in, \notin, \neq, \leq, >}

[57]. To address the drawbacks of ESB algorithms, DTB algorithms have been introduced. According to [57], DTB algorithms can be better than ESB algorithms for three reasons. First, decision trees perform local discretization of numeric features. Second, decision trees have a small proportion of candidate features even in longer tree paths, which helps to reduce the search space. Third, decision trees can handle missing values by introducing a penalizing factor in the measure for evaluating candidate splits. Among DTB contrast pattern miners, Random Forest miners have shown better diversity of high-quality patterns than other approaches based on decision trees [59,61].

For these reasons, we have decided to employ the publicly available implementation [62] of the PBC4cip algorithm [63], which uses a Random Forest miner to extract contrast patterns. PBC4cip has been primarily applied in handling classification tasks for class imbalance problems [63,64]. However, it has also been used in non-classification settings (e.g., [58]) to characterize distinctions between groups, aligning with the goals of our current work.

2.3.2. Contrast Pattern Mining with PBC4cip

First, we extract the plays associated with each of the tactics found within indirect corner kick plays and their contextual information described in Figure 6. Moreover, we extract plays with a direct corner kick execution (we say these use the direct corner kick tactic, id = 0) as well as their context, and integrate this tactic into our analysis. Figure 7 illustrates our approach for uncovering the differences between successful and failed plays. We use PBC4cip to mine contrast patterns in a per-tactic fashion, i.e., each tactic is analyzed separately.

The public implementation of the PBC4cip algorithm in [62] accepts different parameters that have an influence on the shape and number of contrast patterns found. While most of the default parameters have been used, a brief summary of the relevant parameter choices is shown below.

Univariate decision trees (UDTs). Despite multivariate decision trees (MDTs) showing better classification results than UDTs [64], we considered that univariate relations (e.g., age ≤ 40) are easier to explain than multivariate relationships (e.g., $2 \cdot height + 3 \cdot duration \geq 40$ ) and so we choose the UDT setting from the algorithm.
One-hundred-and-fifty decision trees. This can be considered a rule of thumb that has been used in previous research with good results [58,64].
Max tree depth of four. This generates contrast patterns with, at most, three clauses (also called items), helping increase the interpretability of the resulting patterns. In [58], it was considered that patterns with three or fewer clauses can be easier to transform into actionable information.

2.3.3. Pattern Filtering

Similar to previous contrast pattern research [58,64,65], we follow the simplification and filtering procedure from [55], to remove duplicate, specific, and redundant contrast patterns. A pattern P1 is more specific than a pattern P2 if P2 is contained in P1 and P1 has at least one more item [55,58]. For example, consider

P 1 = [origin corner = R] \land [play duration > 10 \sec] \land [match period = First half]

and

P 2 = [origin corner = R] \land [play duration > 10 \sec]

. We remove P1 because it is more specific than P2.

Moreover, a pattern may contain redundant items, which can be simplified by removing the most general items. An item

I_{j}

is more general than another item

I_{k}

if the set of objects covered by

I_{j}

is a proper superset of the objects covered by

I_{k}

[64]. For example, the pattern

[play duration \leq 10 \sec] \land [play duration \leq 15 \sec]

may be simplified to

[play duration \leq 10 \sec]

since all plays with a duration shorter than 10 s are also shorter than 15 s.

Next, we apply statistical testing to our patterns. One important reason for applying statistical testing during contrast pattern mining is to remove spurious correlations in the data [66]. Similar to previous works [67], we test the statistical significance of a contrast pattern by testing the null hypothesis that contrast pattern support is equal across all classes or, equivalently, that contrast pattern support is independent of class membership, using a chi-squared test of independence with a significance level of 0.05. The support counts for a contrast pattern in each class are a form of frequency data that can be analyzed in contingency tables [67]. Since we look for contrasts between only two classes, we can express the frequency counts using a 2 × 2 contingency table, where rows describe the use (or not) of a contrast pattern, and the columns describe their class. Lastly, to deal with the multiple comparison problem (one hypothesis testing per contrast pattern), we performed the Bonferroni correction to our significance level (

0.05 / C_{i}

), where

C_{i}

is the number of contrast patterns found for a given tactic

t_{i}

. As we shall see, the filtering and statistical testing steps described in this section significantly reduce the amount of contrast patterns to consider during our analysis.

2.3.4. Pattern Selection

An important task when working with contrast patterns is the assessment of their quality or discriminative ability [68]. One common way to assess the quality of a contrast pattern is to use support difference. Let

D_{s_{i}}

and

D_{f_{i}}

be the sets of successful and failed plays using tactic i, respectively. The support difference of a contrast pattern X is described in (4).

DiffSup (X) = | \sup_{D_{s_{i}}} (X) - \sup_{D_{f_{i}}} (X) |

(4)

The support difference is maximized when the supports are as far as possible from each other; this happens when the pattern applies to all plays within one class but none in the other. For this reason, the support difference aligns with our objective of discovering conditions that anticipate the success or failure of a corner kick. While other quality metrics exist, the assessment of different quality metrics is not within the scope of this work.

For a contrast pattern, the class with the highest support determines the class of the pattern [59]. To infer the most favorable and unfavorable conditions for tactic application, we divide the contrast patterns found for a given tactic by their class and focus our analysis on the single most informative high-quality pattern in each class.

3. Results

We were able to extract a total of 17,733 valid corner kick plays from the event log we used (counting both direct and indirect corner kicks). In the remainder of this section, we focus on showing the type of results and insights we can obtain from the proposed framework by applying our methodology, as a case study, to plays in which the defining outcome occurs within the Penalty Middle (PM) region. We have selected the PM region as more plays end in this region (6541/17,773 = 37%) than in any other region. However, the same process can be applied to studying plays that end in different regions; thus, replicating this approach to other terminal regions makes it possible to gain further insights into the corner kick strategies.

3.1. Discovery of Tactics with Sequitur

Of the 6541 corner kick plays that end in the PM region, there are 1567 indirect corner kick plays. These plays contain between two and 35 symbols, with a median (50th percentile) of three symbols per play. From such plays, Sequitur created a grammar with 171 rules (tactics). Table 2 shows the summary statistics for the tactics based on the three metrics introduced in Section 2.2.3: frequency, length, and success rate.

From Table 2, we observe that the usage of tactics tends to fall around the median value of three plays. Notably, the most frequent tactic appears in 290 indirect corner kick plays. Moreover, the median length of the tactics is three symbols, and typical lengths (i.e., without considering outliers) lie below six symbols. We can also observe that some of the tactics found had a success rate as high as 1. The following section presents the most relevant tactics according to each metric. Furthermore, we also show those where we found a statistically significant association between the use (or not) of a tactic and the classes.

3.1.1. Relevant Tactics

From the practitioner’s perspective, there are different known ways to execute a corner kick. However, there are hardly any precise or standard definitions for these different types of corner kick executions. Instead, these are typically described at a high level in various non-academic sources (e.g., [69,70,71,72,73,74,75]). For each of the highlighted tactics, we look for similarities with known ways of executing a corner kick described in the literature.

Frequent tactics. Table 3 shows the most frequent tactics; due to their frequency (more than 50 plays), these can be considered the most representative tactics in our plays. Additionally, Figure 8 shows two examples of plays that use these tactics; each play is shown in a different color. For ease of reference, we assigned a short name to each of the tactics found based on their similarities with the consulted literature.

We observed that Tactic 85 (Figure 8a) resembles a short corner kick in which the team passes the ball to the region near the flag post and then follows it with a pass toward the penalty box [69,70,71,72]; we name this tactic short. Tactic 80 (Figure 8b) is a variation of the short corner kick in which two passes are made within the region near the flag post; we name this tactic short variation. Tactic 111 (Figure 8c) does not resemble any of the behaviors in the consulted literature; we name it Rebound to capture the fact that these kinds of plays contain a Ball movement event (likely a loose ball) followed by a pass toward the penalty box. Tactic 44 (Figure 8d) resembles the near-post corner kick execution [70], we see the ball is initially delivered to the left side of the penalty box and then it moves towards a teammate at the center of the penalty box; we name this tactic near post. Tactic 141 (Figure 8e) resembles the far-post corner kick execution [70], the team passes towards the right side of the penalty box and then moves the ball towards a teammate at the center of the penalty box, we refer to this tactic as Far post. Lastly, Tactic 122 (Figure 8f) is similar to the direct corner kick tactic, but it makes one additional pass within the middle of the penalty box before the defining outcome; we name it penalty box. It is worth noting that while the described tactics are two or three events long, the plays in which they are used can be longer. For example, Figure 8e describes a four-event-long play (in blue) that makes use of the Far post tactic which on its own is only two events long.

Longest tactic. Interestingly, the longest tactic is ten symbols and it describes a sequence of ten passes through region Backfield (B), which we found for only two plays in our data set. Figure 9 shows one of these plays which resembles the known switch of play game sequence [76], in which the ball travels from one side of the field to the other. However, due to the large size of region B, there could be many other sequences through the region that do not necessarily resemble a switch of play; we name this tactic 10-touch backfield, to capture the ten passes that occur within this region. The inability of this tactic to describe whether we are in the presence of a switch of play or not is tied to the lack of granularity in the field division introduced in the previous section, suggesting that improvements to this representation might result in more intuitive and easier-to-understand tactics.

Successful tactics. Since the success rate of less frequent tactics can be questionable, we opted to look for statistically significant associations between the use (or not) of a tactic and its success (or not) using a chi-squared test of independence with a significance level of 0.05. Table 4 shows the tactics for which a statistically significant association was found using this approach. Still, for reference, Table 5 provides the success rate of the most representative tactics identified in Table 3.

Notice that statistically significant tactics from Table 4 show many similarities with the most representative tactics from Table 3. For example, Tactic 44 is similar to Tactic 30; they both start with a pass to the PL region. However, for statistically significant tactics, the ball travels towards the middle of the penalty box (the PM region) through an implicit ball movement (instead of a pass). Interestingly, tactics with implicit ball movements tend to yield higher success rates than those with explicit ball movement. However, the comparison could be biased due to the difference in the number of plays using each tactic. Still, the statistically significant associations found for tactics in Table 4 suggest that implicit movements of the ball (likely loose balls or sprints) might lead to more successful than failed plays.

Possibly, the main limitation of our approach is related to the nature of the data we used, which only captures on-the-ball actions. However, off-the-ball actions are crucial for defining other known ways to execute a corner kick (e.g., many of those defined in [75]).

3.1.2. Play Compression

The play compression characteristics of our output are illustrated in Figure 10 for the four-event play (blue) in Figure 8e. Notice that after executing Sequitur, we can express the play’s symbolic representation in terms of two rules of our grammar (R1 and R141), this creates a compressed representation of our play. After assigning a short name to each of these tactics, we can express the play using natural language; we call this representation the high-level play. In our example, our play can be described in natural language as two touches through the backfield followed by a Far post delivery. Moreover, since the original play is expressed by four symbols and the compressed play is expressed only by two, we say the play is compressed by a factor of

(4 / 2) = 2

.

Previous studies [45,58] have suggested that compression and understanding are strongly related. In our work, we observed that representing plays using tactics aids in explaining them with terminology that is informative to the game’s actors. Table 6 summarizes the compression characteristics of the output of the algorithm.

Notably, the mean compression factor of 2.61 shows that through the tactics found, most plays can be expressed roughly by half of the symbols compared to the number of symbols in the original symbolic plays, implying that the tactics we have found can also lead to a more concise representation of plays.

3.2. Discovery of Strategies with PBC4cip

In this section, we describe the results of applying the contrast pattern mining approach in Section 2.3 to assess the impact of contextual conditions on the discovered tactics. After running PBC4cip in a per-tactic fashion and applying the filtering strategy outlined in Section 2.3.3, we were left with contrast patterns for only 6 out of the 172 tactics initially considered. This corresponds to a total of 747 contrast patterns, which represents a reduction of more than 90% in the total number of patterns we found before the filtering stage. We can partly explain this reduction by the infrequent use of certain tactics in our plays. In such cases, the contrast patterns found cover only a limited number of plays, often failing to pass the statistical significance tests. Table 7 provides summary statistics for the tactics in which contrast patterns prevailed.

We can see that the direct corner kick tactic is the most frequent tactic, as well as the tactic in which the algorithm finds the largest number of contrast patterns. The tactic is used in 4974 out of 6541 corner kicks that conclude within the Penalty Middle (PM) region, 1515 successful plays, and 3459 failed ones. After the filtering and statistical testing, we identified 650 contrast patterns for these plays. Among these patterns, 267 had higher support for successful plays, while 383 had higher support for failed plays. Notice that, for the short variation, Rebound, and Far post tactics, we found contrast patterns for the success class but not for the failed class, implying that, for these tactics, we can describe patterns that lead to more successful than failed plays but not the other way around. It is worth noting that many of the tactics in Table 7 correspond to the most representative (frequent) tactics in Table 3; tactics with limited information have been automatically discarded.

3.2.1. Favorable Conditions for Tactic Application

Table 8 shows the most favorable conditions for applying tactics based on the selected contrast patterns for each tactic. The table shows the support of the pattern in each class, in descending order based on the support differences. For ease of reference, we have assigned a unique identifier (CPID) to each contrast pattern.

As part of our research, we aimed to validate whether favorable conditions for applying one tactic are also favorable for applying another tactic. While a visual inspection of the patterns in Table 8 suggests that the most favorable conditions are not the same across the different tactics, we can further investigate this relationship by performing an inter-tactic analysis of the patterns, i.e., assessing the quality of each pattern across the tactics under study. To this end, we evaluated each contrast pattern from Table 8 in the plays from other tactics. This evaluation aimed to determine how well these conditions could distinguish between successful and failed plays in other tactics (based on the support difference). Table 9 shows the support difference of each contrast pattern across the tactics from Table 7; the tactic where the largest support difference occurs is marked in bold. In all cases, we found that the tactic where the largest support difference occurred is where the contrast pattern originated.

3.2.2. Description of Favorable Conditions

In this section, we show how to describe the most favorable conditions for tactic application using natural language. Coaches and practitioners may use their expertise to gather insights, create plausible explanations for the observed behaviors, and pose relevant hypotheses that can help them support decision making. Appendix A provides summary statistics for the different contextual factors, which can help to better understand the patterns.

For the Far post tactic (CPID = 1), plays which were executed by offensive team players whose average height was below 191 cm (play avg. off. height < 191 cm), which at the time of execution were not losing the game (goal difference ≥ 0), and which were facing a team whose average height was above 183 cm (team avg. def. height ≥ 183 cm) led to more successful than failed plays. In particular, these conditions were present in 83% (15/18) of successful plays using this tactic, but only in 19% (8/43) of the failed ones, resulting in a support difference of 0.64. Notice that this tactic is used in 61 plays, representing approximately 3.9% (61/1567) of the indirect corner kick plays that conclude within the Penalty Middle region. However, since the direct corner kick tactic has a much greater usage, this tactic represents roughly 1% (61/6541) of the total corner kick plays that conclude within this region.

For the Rebound tactic (CPID = 2), plays that were executed in less than ten seconds (play duration < 10 s), against teams whose average market value was 12.15 M€ or less (team avg. def. market val. ≤ 12.15 M€), and whose average age was 27 years or more (team avg. def. age ≥ 27), led to more successful than failed plays. These conditions were present in 83% (15/18) of the successful plays using this tactic but only in 26% (21/81) of the failed ones, resulting in a support difference of 0.57, the second largest across our tactics. Notice that this tactic is used in 99 plays, representing around 6.3% (99/1567) of the indirect corner kick plays that conclude within the Penalty Middle region. However, since the direct corner kick tactic has a much greater usage, this tactic represents roughly 1.5% (99/6541) of the total corner kick plays that conclude within this region. Interestingly, unlike many other tactics, the offensive or defensive team’s heights are not part of the most favorable conditions for applying this tactic. Teams without particularly tall players could find it worthwhile to further investigate the potential benefits of using this tactic more often.

For the near post tactic (CPID = 3), plays that were executed during the early stages of the tournament (tournament progress ≤ 0.85) by offensive team players whose height was 177 cm or more (play avg. off. height ≥ 177 cm), and whose average market value was above 6.15 M€ (play avg. off. market val. > 6.15 M€), led to more successful than failed plays. These conditions were present in 85% (22/26) of the successful plays using this tactic but only in 36% (25/70) of the failed ones, resulting in a support difference of 0.49, the third largest across our tactics. Notice that this tactic is used in 96 plays, representing around 6.1% (96/1567) of the indirect corner kick plays that conclude within the Penalty Middle region. However, since the direct corner kick tactic has a much greater usage, this tactic represents roughly 1.5% (96/6541) of the total corner kick plays that conclude within this region.

For the direct tactic (CPID = 4), plays that were executed by offensive team players whose average height was 186 cm or more (play avg. off. height ≥ 186 cm) in less than six seconds (play duration < 6 s) and in which the goalkeeper did not attack the ball (Goalkeeper leaving line= False), led to more successful than failed plays. These conditions were present in 59% (887/1515) of successful plays using this tactic, compared to only 22% (777/3459) of failed plays, resulting in a support difference of 0.37, the fourth largest across our tactics. Notice that this tactic is used in 4974 plays, representing roughly 76% (4974/6541) of the total number of corner kick plays that conclude within the Penalty Middle region.

With this pattern, we show that in some cases, some conditions may be more interesting than others in determining the success of a tactic. For example, the condition in which the goalkeeper does not leave the line is present in over 98% of plays using this tactic (see Appendix A). Therefore, this attribute may be less critical in anticipating the success of the tactic. Furthermore, successful plays tend to be shorter than six seconds. However, the duration of a play is often beyond the control of the offensive team and is likely a consequence of the outcome. Therefore, from a practical standpoint, the player’s height attribute may be more relevant than the other conditions. A correct interpretation of the results can be crucial for effectively supporting decision making in the field.

In the case of the short variation tactic (CPID = 5), plays that were executed by offensive team players whose average height was 177 cm or more (play avg. off. height ≥ 177 cm) in roughly less than 19 s from the time the corner kick was awarded (preparation time < 19 s) and in which there was a fight for the ball (num. duels ≥ 1), led to more successful than failed plays. These conditions were present in 50% (13/26) of the successful plays using this tactic but only in 13% (12/95) of the failed ones, resulting in a support difference of 0.37, the fourth largest difference across our tactics (tied with the support difference for the direct corner kick). Notice that this tactic is used in 121 plays, representing around 7.8% (121/1567) of the indirect corner kick plays that conclude within the Penalty Middle region. However, since the direct corner kick tactic has a much greater usage, this tactic represents roughly 1.8% (121/6541) of the total corner kick plays that conclude within this region. Interestingly, this is the only tactic in which the preparation time appears among the most favorable conditions.

For the short tactic (CPID = 6), plays that were executed by offensive team players whose average height was 174 cm or more (play avg. off. height ≥ 174 cm) in less than eight seconds (play duration < 8 s) and in which there was a fight for the ball (num. duels > 1), led to more successful than failed plays. These conditions were present in 55% (40/73) of the successful plays using this tactic but only in 21% (45/217) of the failed ones, resulting in a support difference of 0.34, the fifth largest difference across our patterns. Notice that this tactic is used in 290 plays, representing around 18.5% (290/1,567) of the indirect corner kick plays that conclude within the Penalty Middle region. However, since the direct corner kick tactic has a much greater usage, this tactic represents approximately 4.4% (290/6541) of the total corner kick plays that conclude within this region.

We can see that the most favorable conditions for applying this tactic are similar to those of the direct corner kick tactic. However, for the direct corner kick tactic, success is associated with play durations below six seconds, whereas for the short tactic, success is associated with play durations below eight seconds. It is reasonable to consider that the successful execution of the direct corner kick tactic tends to be faster than that of the short corner kick tactic (as the latter requires one extra pass). This behavior could help explain why, as shown in Table 9, the most favorable conditions for applying the short tactic also led to similar support differences when evaluated in plays using the direct corner kick tactic and why the reverse relationship does not hold.

3.2.3. Unfavorable Conditions for Tactic Application

Table 10 shows the most unfavorable conditions for applying the tactics based on the selected contrast patterns for each tactic. Notably, not all tactics with successful contrast patterns have failed contrast patterns. Furthermore, Table 11 shows the support difference of each of these contrast patterns across all tactics where contrast patterns prevailed (Table 7); the tactic where the largest support difference occurs is marked in bold. Similar to the results in Table 9, the largest support difference occurs in the tactic where the contrast pattern originated. When evaluated in other tactics, the support difference significantly drops, suggesting that the discovered unfavorable conditions tend to be specific to a given tactic.

3.2.4. Description of Unfavorable Conditions

For the near post tactic, plays which were executed by offensive team players whose average market value was 31.15 M or less (play avg. off. market val. ≤ 31.15 M€), which at the time of execution were winning by at most one goal (goal difference < 2), and which were facing a team whose goalkeeper market value was 33.5 M€ or less (def. goalkeeper market val. ≤ 33.5 M€), led to more failed than successful plays. These conditions were present in 83% (58/70) of the failed plays using this tactic but only in 42% (11/26) of the successful ones, resulting in a support difference of 0.41, the largest among the unfavorable conditions in our tactics. Interestingly, this pattern depends on the market value of both the defensive and offensive teams. However, in both cases, the threshold is set at values close to the 90th percentile of the attribute, likely indicating that these conditions are applicable for most of the cases, except for those with atypically large market values.

For the direct tactic, plays executed by relatively short offensive team players (178 cm ≤ play avg. off. height < 186 cm), where less than two duels occurred (num. duels < 2), led to more failed than successful plays. These conditions were present in 51% (1760/3459) of the failed plays using this tactic but only in 12% (178/1515) of the successful ones, resulting in a support difference of 0.39, the second largest across the unfavorable conditions in our tactics.

Lastly, for the short tactic, plays where no duels occurred (num. duels < 2), executed against teams whose average market value is greater than 2.15 M€ (team avg. def. market val. > 2.15 M€) and whose average age was 25 years or more (team avg. def. age ≥ 25 years) led to more failed than successful plays. These conditions were present in 52% (113/217) of the failed plays using this tactic but only in 18% (13/73) of the successful ones, resulting in a support difference of 0.34, the third largest across the unfavorable conditions in our tactics. Interestingly, for this tactic, an increased defensive age (in combination with other contextual conditions) is linked to an increased number of failed plays, or conversely, to a positive impact on defensive performance. At the same time, in the previous section, we saw that for the Rebound tactic, an increased defensive age (in combination with other contextual conditions) negatively impacted defensive performance. This behavior suggests a benefit in performing an analysis that treats different tactics separately.

4. Discussion

By analyzing the corner kick plays that conclude within the Penalty Middle region in our data set, we found that teams strongly prefer the direct corner kick execution. Furthermore, by looking for recurrent sequences of events in the 1567 indirect corner kicks that conclude within the same region, we found that, in many cases, the discovered tactics resemble known ways to execute a corner kick described in the practitioner’s literature. Interestingly, we observed that out of the 171 tactics found by Sequitur, only 6 are used in more than 50 plays, which we considered to be the most representative tactics in our data set. This result suggests that there is little variation in the tactics adopted by offensive teams across indirect corner kicks targeted towards the Penalty Middle region. Furthermore, we observed that tactics with a statistically significant association with the success of a play share many similarities with the most representative tactics. However, these other tactics involve implicit movements of the ball (as opposed to explicit passes) when reaching the scoring zone. Using our approach, it is possible to describe the step-by-step execution of different types of corner kicks (as opposed to the high-level descriptions found in other non-academic sources), and at the same time obtain insights into their usage which could help support decision making in the field. For example, coaches could create training sessions to practice defending at least against the six most common types of corner kick executions. Furthermore, coaches could identify less frequent but highly successful tactics to gather more data and validate whether these could lead to higher success rates than other common tactics.

Additionally, we used contrast pattern mining to identify differences between successful and unsuccessful plays involving a given tactic. We filtered and statistically tested the patterns, resulting in contrast patterns for only six out of the 172 considered tactics. Next, we selected the highest-quality patterns for each of these tactics to infer the most favorable and unfavorable conditions for tactic application. We observed that only half of the considered contextual factors appear within the most favorable or unfavorable conditions for tactic application. Not surprisingly, the average height of offensive players is one of the most prevalent attributes among the discovered patterns. However, the patterns allow us to obtain insights about the influence of the attributes. For example, by specifying a height threshold, we can identify corner kick plays that are more promising. Additionally, we can quantitatively assess how much more successful these plays were compared to failed ones when such conditions were met. Practitioners may take our results to derive these kinds of insights as well as plausible explanations for the observed behaviors. For example, the presence of the player’s height in the patterns is likely explained by the ability of taller players to fight for aerial balls, the market value may be linked to the player’s skill, and a short preparation time may be linked to a possible surprise factor during the execution.

Furthermore, we found that the most favorable and unfavorable conditions for applying the tactics are not the same across different tactics, suggesting a benefit in performing an analysis that treats different tactics separately. For example, we found that for the Rebound tactic, an increased defensive player age (in combination with other contextual conditions) appears to impact the performance of the defensive team negatively. On the other hand, for the short corner kick, an increased defensive player age appears to impact the performance of the defensive team positively. This behavior may be attributed, for example, to differences in the physical demands of various tactics or to the importance of player experience over physical attributes for other tactics.

Interestingly, we found that the quality of the contrasts we can make between successful and failed plays is different for different tactics. However, direct comparisons of the quality of the contrasts across different tactics could be biased as our analysis considers a different number of plays for each tactic. Therefore, our results might be used to obtain valuable insights for a given tactic of interest but not so much to choose the best tactic among the tactics analyzed. Still, practitioners could use our results to guide further data collection to better compare tactics of interest by using a similar number of plays for each.

Unlike previous works, our approach can capture how the interaction between multiple contextual factors impacts the outcome of a corner kick, and the results can be easily explained to others in natural language. At the same time, it integrates sequences of ball movements into the analysis. Overall, our work provides a new perspective for the analysis of strategies for the corner kick.

5. Future Work

Possibly, the main limitation of our approach is linked to the nature of the data we used, which mainly captures on-the-ball actions. Section 2.2 describes how this limitation makes it impossible for our tactics to describe complex corner kick executions that rely on relevant off-the-ball actions. Furthermore, this limitation does not allow us to extract contextual factors that may be relevant for differentiating between successful and failed plays. Previous works presented in Section 1.1 often study contextual factors that capture off-the-ball actions; this is possible thanks to their custom data collection processes. However, such attributes cannot be extracted from the type of data we considered. Including additional sources of information to capture relevant off-the-ball contextual factors could help increase the quality of the contrasts we can describe between successful and failed plays.

We suggest using player tracking data to extract relevant events from players without the ball, such that we can describe the step-by-step execution of more complex types of corner kicks, allowing us to gains insights into their usage. Moreover, player tracking data could also be used to derive contextual information involving off-the-ball actions (e.g., number of defensive players on the post, number of attackers, and type of defensive marking), which, based on the results of previous works, could be worth studying under our framework. Alternatively, event data providers may consider including this information natively, making event data sets more suitable for automated strategy analyses.

Additionally, using our current data sources, it may be possible to investigate regional differences in the use of tactics and strategies. For example, to determine if certain tactics are more common in the Spanish first division than the English first division or if certain tactics are more effective in a particular league. Moreover, we can further examine the impact of players on the outcome of a corner kick. For example, by studying how more successful a tactic can be given that a specific player is involved or by studying player-specific attributes that could influence the success of a play (e.g., jumping power and dribbling skills, among others).

Finally, future work could explore how our approach could be adapted to study strategies beyond the corner kick use case. In addition, closely integrating football experts in the development of further work could help guide and guarantee the practical applicability of our results.

Author Contributions

Conceptualization, O.M., R.M., L.C.-S. and J.E.R.-M.; methodology, O.M., R.M., L.C.-S. and J.E.R.-M.; software, O.M. and L.C.-S.; validation, O.M.; formal analysis, O.M.; investigation, O.M.; resources, R.M. and L.C.-S.; data curation, O.M.; writing—original draft preparation, O.M.; writing—review and editing, O.M., R.M. and L.C.-S.; visualization, O.M.; supervision, R.M. and L.C.-S.; project administration, O.M.; funding acquisition, R.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported here was supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) studentship 1148615 to the first author.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in FigShare at https://figshare.com/collections/Soccer_match_event_dataset/4415000/5, reference number [3]. Other publicly available datasets were analyzed in this study. This data can be found here: [54].

Acknowledgments

We are grateful to the members of the Advanced Artificial Intelligence Research Group @ Tecnologico de Monterrey for their useful comments on an earlier draft of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Contextual Information

In Section 2.3, we present a list of contextual factors extracted from the metadata of the events in our plays and from complementary data sets from [3,54]. In this chapter, we provide a detailed description of these attributes (Table A1 and Table A2) and their summary statistics (Table A3 and Table A4).

Table A1. Description of categorical contextual factors.

Variable	Category	Description
Match period	Game	The period of the match (first half = 1H, second half = 2H) when the corner kick is executed.
Offside presence	Play	This variable indicates whether there was an offside call during the current corner kick execution (true) or not (false).
Goalkeeper leaving line	Use case	This variable indicates whether the Goalkeeper leaves the goal line (true) or not (false) during the current corner kick execution.
Origin corner	Use case	This variable indicates whether the corner kick was taken from the left (L) or right (R) flank.
Preferred foot	Use case	This variable indicates whether the preferred foot of the player executing the corner kick is the left (L) foot, right (R) foot, both (B), or unknown (U).
High corner kick	Use case	This variable indicates whether the initial pass in the corner kick execution is a high ball (true) or not (false).

Table A2. Description of numerical contextual factors.

Variable	Category	Description
Team avg. offensive height	Team	Average height of all players (currently on the field) in the offensive team in centimeters.
Team avg. defensive height	Team	Average height of all players (currently on the field) in the defensive team in centimeters.
Team avg. offensive age	Team	Average age of all players (currently on the field) in the offensive team in years.
Team avg. defensive age	Team	Average age of all players (currently on the field) in the defensive team in years.
Defensive goalkeeper market value	Team	Market value of the defensive team goalkeeper in million euros. The market value is computed as the average market value for the year the match takes place.
Team avg. offensive market value	Team	Average market value of all players (currently on the field) in the offensive team in million euros. For each player, the market value is computed as the average market value for the year the match takes place.
Team avg. defensive market value	Team	Average market value of all players (currently on the field) in the defensive team in million euros. For each player, the market value is computed as the average market value for the year the match takes place.
Play avg. offensive market value	Play	Average market value of all offensive players involved in the corner kick play in million euros. For each player, the market value is computed as the average market value for the year the match takes place.
Play avg. offensive height	Play	Average height of all offensive players involved in the corner kick play (except for the kicker) in centimeters.
Play avg. offensive age	Play	Average age of all offensive players involved in the corner kick play in years.
Preparation time	Play	Time between the corner kick being awarded and the corner kick being executed in seconds. It is computed as the time delta between the first event in a corner kick play and the last event before it.
Number of duels	Play	Number of duel events in a corner kick play (before play preprocessing).
Duration	Play	Duration of a corner kick play in seconds. It is computed as the time delta between the last event in a play and the first event (before play preprocessing).
Length	Play	Number of pass and ball movement events in a corner kick play.
Clock time	Game	The number of minutes that have elapsed since the beginning of the current half until the corner kick execution.
Goal difference	Game	The difference between the number of goals scored by the offensive team and the goals scored by the defensive team.
Progress	Tournament	Progress of the tournament at the time of corner kick execution. Computed as the current tournament week divided by the total number of weeks of the tournament.
Advantage	Tournament	The difference between the number of matches won by the offensive team and the number of matches won by the defensive team throughout the tournament at the time of corner kick execution.

Table A3. Categorical attribute statistics for all plays.

Variable	Count	Num. Unique	Mode	Mode Freq
Goalkeeper leaving line	6541	2	f	6416
Match period	6541	2	2H	3405
Offside presence	6541	2	f	6501
Origin corner	6541	2	L	3519
preferred foot	6541	4	R	4024
High corner kick	6541	2	t	5498

Table A4. Numerical attribute statistics for all plays.

Variable	Count	Mean	Std	Min	10%	25%	50%	75%	90%	Max
Team avg. offensive height	6541	182.58	1.93	176	180	181	183	184	185	189
Team avg. defensive height	6541	182.67	1.93	176	180	181	183	184	185	189
Team avg. offensive age	6541	26.96	1.5	22	25	26	27	28	29	33
Team avg. defensive age	6541	27	1.5	22	25	26	27	28	29	33
Defensive goalkeeper market value	6399	7.39	10.49	0.05	0.63	1.58	4.05	8.1	17.4	63
Team avg. offensive market value	6541	11.78	11.95	0.5	2	3.5	7	15.9	28.94	68.5
Team avg. defensive market value	6540	9.14	9.76	0.5	1.8	3	5.8	10.7	22.9	68.5
play avg. offensive market value	6426	14.18	17.73	0.1	1.7	3.2	7.6	17.4	36	162
Play avg. offensive height	6541	184.03	4.88	163	179	181	184	187	190.4	203
Play avg. offensive age	6541	27.14	2.86	18	24	25	27	29	31	38
Preparation time	6541	24.46	14.82	0	11	17	23	30	38	614
Number of duels	6541	1.56	1.79	0	0	0	2	2	4	14
Clock time	6541	24.53	13.48	0	6	13	24	36	43	54
Duration	6541	4.03	4.22	0	1	2	3	5	8	107
Goal difference	6541	−0.06	1.07	−5	−1	−1	0	0	1	7
Length	6541	1.42	1.1	1	1	1	1	1	3	32
Progress	6541	0.51	0.29	0	0.1	0.3	0.5	0.8	0.9	1
Advantage	6541	0.06	2.01	−14	0	0	0	0	1	14

References

Harell, A.; Bajíc, I.V. The Data Gap in Sports Analytics and How to Close It. In Proceedings of the Artificial Intelligence in Team Sports Workshop at The Thirty Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
Decroos, T. Soccer Analytics Meets Artificial Intelligence: Learning Value and Style from Soccer Event Stream Data. Ph.D. Thesis, KU Leuven, Leuven, Belgium, 2020. [Google Scholar]
Pappalardo, L.; Cintia, P.; Rossi, A.; Massucco, E.; Ferragina, P.; Pedreschi, D.; Giannotti, F. A public data set of spatio-temporal match events in soccer competitions. Sci. Data 2019, 6, 236. [Google Scholar] [CrossRef] [PubMed]
Secareanu, A. Football Events. Available online: https://www.kaggle.com/datasets/secareanualin/football-events (accessed on 1 May 2023).
Agarwal, H. Football Analytics (Event Data). Available online: https://www.kaggle.com/datasets/hardikagarwal1/football-analytics-event-data-statsbomb (accessed on 1 May 2023).
Strategy. Merriam-Webster Online Dictionary. 2023. Available online: https://www.merriam-webster.com/dictionary/strategy (accessed on 28 March 2023).
Strategy. Oxford Essential Dictionary of the U.S. Military. 2002. Available online: https://www.oxfordreference.com/view/10.1093/acref/9780199891580.001.0001/acref-9780199891580 (accessed on 28 March 2023).
Bellay, A. What Is Strategy? 2019. Available online: https://straty.com/what-is-strategy/ (accessed on 28 March 2023).
Yiannakos, A.; Armatas, V. Evaluation of the goal scoring patterns in European Championship in Portugal 2004. Int. J. Perform. Anal. Sport 2006, 6, 178–188. [Google Scholar] [CrossRef]
Casal, C.A.; Maneiro, R.; Ardá, T.; Losada, J.L.; Rial, A. Analysis of Corner Kick Success in Elite Football. Int. J. Perform. Anal. Sport 2015, 15, 430–451. [Google Scholar] [CrossRef]
Sainz de Baranda, P.; López-Riquelme, D.; Ortega, E. Criterios de eficacia ofensiva del saque de esquina en el Mundial de Alemania 2006: Aplicaciones al entrenamiento. Rev. EspañOla Educ. FíSica Deport. 2011, 395, 47. [Google Scholar]
Hobbs, J.; Ruiz, H.; Wei, X.; Lucey, P. Mythbusting set-pieces in soccer. In Proceedings of the 12th Annual MIT Sloan Sports Analytics Conference, Boston, MA, USA, 23–24 February 2018. [Google Scholar]
Flores, J.; García-Manso, J.; Martin-Gonzalez, J.; Ramos, E.; Arriaza, E.; Da Silva-Grigoletto, M. Análisis y evaluación del lanzamiento de esquina (córner) en el fútbol de alto nivel. Rev. Andal. Med. Deport. 2012, 5, 140–146. [Google Scholar] [CrossRef]
Zileli, R.; Söyler, M. Analysis of corner kicks in FIFA 2018 World Cup. J. Hum. Sport Exerc. 2022, 17, 156–166. [Google Scholar] [CrossRef]
Gouveia, V.; Duarte, J.P.; Sarmento, H.; Freitas, J.; Rebelo-Gonçalves, R.; Amaro, N.; Matos, R.; Antunes, R.; Field, A.; Monteiro, D. Systematic Observation of Corner Kick Strategies in Portuguese Football Players. Sustainability 2022, 14, 896. [Google Scholar] [CrossRef]
Mitrotasios, M.; Casal, C.; Armatas, V.; Losada, J.; Dios, R. Analysis of Corner Kick Success in Laliga Santander. Eur. J. Hum. Mov. 2021, 47, 8–22. [Google Scholar] [CrossRef]
Beare, H.; Stone, J.A. Analysis of attacking corner kick strategies in the FA women’s super league 2017/2018. Int. J. Perform. Anal. Sport 2019, 19, 893–903. [Google Scholar] [CrossRef]
Pulling, C. Long Corner Kicks in the English Premier League: Deliveries into the Goal Area and Critical Area. Kinesiol. Int. J. Fundam. Appl. Kinesiol. 2015, 47, 193–201. [Google Scholar]
Kubayi, A.; Larkin, P. Analysis of teams’ corner kicks defensive strategies at the FIFA World Cup 2018. Int. J. Perform. Anal. Sport 2019, 19, 809–819. [Google Scholar] [CrossRef]
Lee, J.; Mills, S. Analysis of corner kicks at the FIFA Women’s World Cup 2019 in relation to match status and team quality. Int. J. Perform. Anal. Sport 2021, 21, 679–699. [Google Scholar] [CrossRef]
Lames, M.; McGarry, T. On the search for reliable performance indicators in game sports. Int. J. Perform. Anal. Sport 2007, 7, 62–79. [Google Scholar] [CrossRef]
Mackenzie, R.; Cushion, C. Performance analysis in football: A critical review and implications for future research. J. Sport. Sci. 2013, 31, 639–676. [Google Scholar] [CrossRef] [PubMed]
Kröckel, P. Big Data Event Analytics in Football for Tactical Decision Support. Ph.D. Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany, 2019. [Google Scholar]
Gyarmati, L.; Anguera, X. Automatic Extraction of the Passing Strategies of Soccer Teams. arXiv 2015, arXiv:1508.02171. [Google Scholar]
Van Haaren, J.; Hannosset, S.; Davis, J. Strategy discovery in professional soccer match data. In Proceedings of the KDD-16 Workshop on Large-Scale Sports Analytics, San Francisco, CA, USA, 14 August 2016. [Google Scholar]
Provost, F.; Fawcett, T. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking; O’Reilly: Sebastopol, CA, USA, 2013. [Google Scholar]
Herold, M.; Goes, F.; Nopp, S.; Bauer, P.; Thompson, C.; Meyer, T. Machine learning in men’s professional football: Current applications and future directions for improving attacking play. Int. J. Sport. Sci. Coach. 2019, 14, 798–817. [Google Scholar] [CrossRef]
Brooks, J.; Kerr, M.; Guttag, J.V. Using machine learning to draw inferences from pass location data in soccer. Stat. Anal. Data Min. 2016, 9, 338–349. [Google Scholar] [CrossRef]
Decroos, T.; Van Haaren, J.; Davis, J. Automatic Discovery of Tactics in Spatio-Temporal Soccer Match Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &; Data Mining (KDD ’18), London, UK, 19–23 August 2018; pp. 223–232. [Google Scholar] [CrossRef]
Hirano, S.; Tsumoto, S. Grouping of soccer game records by multiscale comparison technique and rough clustering. In Proceedings of the Fifth International Conference on Hybrid Intelligent Systems (HIS’05), Rio de Janerio, Brazil, 6–9 November 2005; p. 6. [Google Scholar]
Lucey, P.; Oliver, D.; Carr, P.; Roth, J.; Matthews, I. Assessing team strategy using spatiotemporal data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1366–1374. [Google Scholar] [CrossRef]
Müller-Budack, E.; Theiner, J.; Rein, R.; Ewerth, R. “Does 4-4-2 exist?”—An Analytics Approach to Understand and Classify Football Team Formations in Single Match Situations. In Proceedings of the 2nd International 942 Workshop on Multimedia Content Analysis in Sports, Nice, France, 21–25 October 2019; MMSports ’19. pp. 25–33. [Google Scholar] [CrossRef]
Andrienko, G.; Andrienko, N.; Anzer, G.; Bauer, P.; Budziak, G.; Fuchs, G.; Hecker, D.; Weber, H.; Wrobel, S. Constructing Spaces and Times for Tactical Analysis in Football. IEEE Trans. Vis. Comput. Graph. 2019, 27, 2280–2297. [Google Scholar] [CrossRef]
Gudmundsson, J.; Wolle, T. Football analysis using spatio-temporal tools. Comput. Environ. Urban Syst. 2014, 47, 16–27. [Google Scholar] [CrossRef]
Feuerhake, U. Recognition of Repetitive Movement Patterns—The Case of Football Analysis. ISPRS Int. J. Geo-Inf. 2016, 5, 208. [Google Scholar] [CrossRef]
Beernaerts, J.; De Baets, B.; Lenoir, M.; Van de Weghe, N. Spatial movement pattern recognition in soccer based on relative player movements. PLoS ONE 2020, 15, e0227746. [Google Scholar] [CrossRef]
Cintia, P.; Rinzivillo, S.; Pappalardo, L. A network-based approach to evaluate the performance of football teams. In Proceedings of the Machine Learning and Data Mining for Sports Analytics Workshop, Porto, Portugal, 7–11 September 2015. [Google Scholar]
Peña, J.L. A Markovian model for association football possession and its outcomes. arXiv 2014, arXiv:1403.7993. [Google Scholar]
Wang, Q.; Zhu, H.; Hu, W.; Shen, Z.; Yao, Y. Discerning Tactical Patterns for Professional Soccer Teams: An Enhanced Topic Model with Applications. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015. [Google Scholar] [CrossRef]
Rein, R.; Memmert, D. Big data and tactical analysis in elite soccer: Future challenges and opportunities for sports science. SpringerPlus 2016, 5, 1410. [Google Scholar] [CrossRef] [PubMed]
Carling, C.; Williams, A.M.; Reilly, T. Handbook of Soccer Match Analysis; Routledge: London, UK, 2005. [Google Scholar]
Bakker, L.L. Visualizing Football Team Strategies and Player Performance. Ph.D. Thesis, Eindhoven University of Technology, Eindhove, The Netherlands, 2015. [Google Scholar]
Fleischman, M. Grounding Language in Events. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2008. [Google Scholar]
Jalali, L.; Jain, R. Event Mining for Explanatory Modeling, 1st ed.; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar]
Nevill-Manning, C.G. Inferring Sequential Structure. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand, 1996. [Google Scholar]
Jacquemont, S. Contributions de l’inférence Grammaticale à la Fouille de Données Séquentielles. (Contributions of Grammatical Inference to Sequential Pattern Mining). Ph.D. Thesis, Jean Monnet University, Saint-Étienne, France, 2008. [Google Scholar]
Joshi, S.; Jadon, R.S.; Jain, R.C. Sequential Pattern Mining Using Formal language Tools. Int. J. Comput. Sci. Issues 2012, 9, 316. [Google Scholar]
Hingston, P. Using Finite State Automata for Sequence Mining; ECU Publications: Joondalup, WA, Australia, 2002. [Google Scholar]
Aguilar, R.; Alonso, L.; López, V.; Moreno, M.N. Incremental discovery of sequential patterns for grammatical inference. In Proceedings of the Workshop on Approaches and Applications of Inductive Programming (AAIP 2005), to be held in conjunction with the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, 7–11 August 2005; Kitzelmann, E., Olsson, R., Schmid, U., Eds.; 2005; pp. 59–67. Available online: https://cogsys.uni-bamberg.de/events/aaip05/proceedings.html (accessed on 24 November 2023).
Jenkins, G. Python SciKit Sequitur. 2021. Available online: https://github.com/grantjenks/scikit-sequitur (accessed on 24 November 2023).
Agencia EFE. Aguirre Sabe Sus Limitaciones. ESPN Deportes 2003. Available online: https://espndeportes.espn.com/nota?id=194662 (accessed on 29 March 2023).
De Obeso, Eugenio. Sin téCnica No Hay táCtica. Informador 2015. Available online: https://www.informador.mx/Ideas/Sin-tecnica-no-hay-tactica-20150618-0227.html (accessed on 29 March 2023).
Muglia, V. La Técnica y Entender el Juego, Claves del Golazo del Bayern. Olé 2020. Available online: https://www.ole.com.ar/tactica/analisis-gol-bayern-psg-muglia-tactica_0_H2_EJYFly.html (accessed on 29 March 2023).
David Cariboo. Football Data from Transfermarkt. 2022. Available online: https://www.kaggle.com/datasets/davidcariboo/player-scores (accessed on 24 November 2023).
Gonzalez, O.L. Supervised Classifiers Based on Emerging Patterns for Class Imbalance Problems. Ph.D. Thesis, Coordinación de Ciencias Computacionales National, Puebla, Mexico, 2017. [Google Scholar]
García-Borroto, M.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. A survey of emerging patterns for supervised classification. Artif. Intell. Rev. 2014, 42, 705–721. [Google Scholar] [CrossRef]
Loyola-González, O.; Medina-Pérez, M.A.; Choo, K.K.R. A Review of Supervised Classification based on Contrast Patterns: Applications, Trends, and Challenges. J. Grid Comput. 2020, 18, 797–845. [Google Scholar] [CrossRef] [PubMed]
Espejel, A.H. Characterisation of Visitors and Description of Their Navigation Behaviour Using Web Log Mining Techniques. Master’s Thesis, Tecnológico de Monterrey, Estado de México, Mexico, 2021. [Google Scholar]
Cervantes, B.; Gómez, F.; Monroy, R.; Loyola-González, O.; Medina-Pérez, M.A.; Ramírez-Márquez, J. Pattern-Based and Visual Analytics for Visitor Analysis on Websites. Appl. Sci. 2019, 9, 3840. [Google Scholar] [CrossRef]
García-Borroto, M.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Medina-Pérez, M.A.; Ruiz-Shulcloper, J. LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recognit. 2010, 43, 3025–3034. [Google Scholar] [CrossRef]
García-Borroto, M.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A. Finding the best diversity generation procedures for mining contrast patterns. Expert Syst. Appl. 2015, 42, 4859–4866. [Google Scholar] [CrossRef]
Cañete Sifuentes, L. Multivariate PBC4cip. Available online: https://sites.google.com/view/leocanetesifuentes/software/multivariate-pbc4cip (accessed on 15 March 2023).
Loyola-González, O.; Medina-Pérez, M.A.; Martínez-Trinidad, J.F.; Carrasco-Ochoa, J.A.; Monroy, R.; García-Borroto, M. PBC4cip: A new contrast pattern-based classifier for class imbalance problems. Knowl.-Based Syst. 2017, 115, 100–109. [Google Scholar] [CrossRef]
Cañete-Sifuentes, L.; Monroy, R.; Medina-Pérez, M.A.; Loyola-González, O.; Vera Voronisky, F. Classification Based on Multivariate Contrast Patterns. IEEE Access 2019, 7, 55744–55762. [Google Scholar] [CrossRef]
Loyola-González, O.; Monroy, R.; Rodríguez, J.; López-Cuevas, A.; Mata-Sánchez, J.I. Contrast Pattern-Based Classification for Bot Detection on Twitter. IEEE Access 2019, 7, 45800–45817. [Google Scholar] [CrossRef]
Webb, G.; Butler, S.; Newlands, D. On Detecting Differences Between Groups. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003. [Google Scholar] [CrossRef]
Bay, S.; Pazzani, M. Detecting Group Differences: Mining Contrast Sets. Data Min. Knowl. Disc. 2002, 5, 213–246. [Google Scholar] [CrossRef]
Dong, G.; Bailey, J. (Eds.) Contrast Data Mining: Concepts, Algorithms, and Applications; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
Soccer Source Coaching. The Complete Guide to the Best Attacking Corner Kick Tactics in Soccer. 2021. Available online: https://soccersourcecoaching.com/attacking-corner-kick-tactics-in-soccer/ (accessed on 12 September 2023).
Harves, J.C. Types of Corner Kicks. 2023. Available online: https://coachingamericansoccer.com/tactics-and-teamwork/types-of-corner-kicks/ (accessed on 12 April 2023).
Green Star Media Ltd. 11 Secrets of Successful Corners; Green Star Media Ltd.: Guildford, UK, 2010. [Google Scholar]
Football Buneski. How to Attack Corner Kicks in Football? (Tactical Analysis). 2023. Available online: https://footballbunsekicom.com/set-piece/how-to-attack-corner-kicks-in-football-tactical-analysis/ (accessed on 12 September 2023).
Allmann, A.; Brenes, O.; Bryant, R.; Chapman, S.; Ellen Coleman, R.; Derse, E.; Ghotbi, A.; Ann Grandjean, E.; Harris, T.; Jackson, N.; et al. LA84 Foundation Soccer Coaching Manual; LA84 Foundation: Los Angeles, CA, USA, 1995. [Google Scholar]
Miller, J. Attacking Soccer; Human Kinetics: Champaign, IL, USA, 2014. [Google Scholar]
Englund, T. The Ultimate Book of Soccer Set Pieces: Strategies for Attack and Defense Restarts; Meyer & Meyer Sport (UK) Ltd.: Maidenhead, UK, 2022. [Google Scholar]
Brooks, R. Tactical Analysis: Switching Play to Create Opportunities. 2023. Available online: https://footballdna.co.uk/features/tactical-analysis-switching-play-to-create-opportunities/ (accessed on 18 September 2023).

Figure 1. Pass sequence examples with and without a position mismatch. A blue circle indicates an event’s initial position, while a smaller one indicates its final position. A blue dotted line connects the initial and final positions for ease of visualization. (a) Example play showing no position mismatch. (b) Example play showing a position mismatch.

Figure 2. Example addition of a synthetic ball movement event for an offensive open field play. The event is added to handle position mismatch between contiguous pass events (id = 1 and id = 3).

Figure 3. Example frame of reference transformation for a corner kick play starting at (x = 100, y = 0). (a) Play shown in its original frame of reference. (b) Play shown in the new frame of reference.

Figure 4. Proposed field representation. Under this representation, the field is divided into 12 regions and all corner kicks start at (x = 100, y = 100).

Figure 5. Mapping between tuples of our intermediate representation and the characters in our alphabet.

Figure 6. Summary of contextual factors extracted for our plays. The contextual factors have been grouped by the type of information that they capture.

Figure 7. Synthetic example showing the contrast pattern mining approach used to discover differences between the successful and failed plays in a per-tactic fashion.

Figure 8. Most representative tactics found for plays ending in the Penalty Middle region. Two examples are shown, in a different color, for each tactic. (a) Short corner kick tactic. (b) Short variation corner kick tactic. (c) Rebound corner kick tactic. (d) Near post corner kick tactic. (e) Far post corner kick tactic. (f) Penalty box corner kick tactic.

Figure 9. Longest tactic (10-touch backfield) appearing in events four through fourteen of a corner kick play.

Figure 10. Play compression example. First, we map the intermediate representation of the play into a symbolic play. Next, after executing Sequitur, we express the symbolic play in terms of grammar rules (compressed play). Finally, by naming each of the rules in the play, we can track its development using informative terminology (high-level play).

Table 1. Output grammar for synthetic example.

Grammar Rules
R0 → R1 \| R1 \| R2 \| R2
R1 → R3 C
R2 → R3 D
R3 → A B

Table 2. Summary statistics for the tactics.

Metric	Mean	Std	Min	25%	50%	75%	Max
Frequency	10.61	27.14	1	2	3	7	290
Length	3.39	1.14	2	3	3	4	10
Success rate	0.30	0.27	0	0	0.31	0.5	1

Table 3. Frequent tactics detected by Sequitur for the indirect corner kicks ending in the PM region.

ID	Tactic	Frequency
85	Pass left flank, Pass penalty middle	290
80	Pass left flank, Pass left flank, Pass penalty middle	121
111	Pass penalty middle, Ball movement, Pass penalty middle	99
44	Pass penalty left, Pass penalty middle	97
141	Pass penalty right, Pass penalty middle	61
122	Pass penalty middle, Pass penalty middle	58

Table 4. Tactics where a statistically significant association between its usage and the classes was found.

ID	Tactic	Freq	% of Indirect Corners	Success Rate	$χ^{2}$	p-Value
11	Pass backfield, Ball movement	30	1.91%	0.90	45.69	$1.38 \times 10^{- 11}$
30	Pass penalty left, Ball movement	49	3.13%	0.86	64.69	$8.88 \times 10^{- 16}$
98	Pass penalty middle, Ball movement	46	2.94%	0.80	49.14	$2.38 \times 10^{- 12}$
136	Pass penalty right, Ball movement	31	1.98%	0.74	24.78	$6.41 \times 10^{- 7}$
153	Pass first post, Ball movement	22	1.40%	0.77	20.18	$7.06 \times 10^{- 6}$

Table 5. Success rate for the most representative tactics in our data set.

ID	Tactic	Freq	% of Indirect Corners	Success Rate
85	Pass left flank, Pass penalty middle	290	18.51%	0.25
80	Pass left flank, Pass left flank, Pass penalty middle	121	7.72%	0.22
111	Pass penalty middle, Ball movement, Pass penalty middle	99	6.32%	0.18
44	Pass penalty left, Pass penalty middle	97	6.13%	0.27
141	Pass penalty right, Pass penalty middle	61	3.89%	0.30
122	Pass penalty middle, Pass penalty middle	58	3.70%	0.30

Table 6. Play compression statistics.

Metric	Symbolic Plays	Compressed Plays
Mean play length	3.17	1.23
Maximum play length	35	9
Standard deviation of play length	1.80	0.62
Mean compression factor	NA	2.61
Standard deviation of compression factor	NA	0.78

Table 7. Statistics for plays and contrast patterns per tactic. First, we display the play count for each tactic, categorized by their outcomes. Second, we present the number of contrast patterns derived from these plays, with the class indicating which class the pattern primarily supports.

ID	Tactic Name	Plays		Contrast Patterns
ID	Tactic Name	Successful	Failed	Successful	Failed
0	Direct	1515	3459	267	383
44	Near post	26	70	2	1
80	Short variation	26	95	2	0
85	Short	73	217	46	18
111	Rebound	18	81	13	0
141	Far post	18	43	15	0

Table 8. Most favorable conditions for tactic application.

CPID	Tactic Name	Contrast Pattern	Support
CPID	Tactic Name	Contrast Pattern	Success	Fail	Difference
1	Far post	team avg. def. height ≥ 183 cm ∧ goal difference ≥ 0 ∧ play avg. off. height < 191 cm	0.83 = (15/18)	0.19 = (8/43)	0.64
2	Rebound	team avg. def. market val. ≤ 12.15 M∈ ∧ team avg. def. age ≥ 27 years ∧ play duration < 10 s	0.83 = (15/18)	0.26 = (21/81)	0.57
3	Near post	play avg. off. height ≥ 177 cm ∧ play avg. off. market val. > 6.15 M∈ ∧ tournament progress ≤ 0.85	0.85 = (22/26)	0.36 = (25/70)	0.49
4	Direct	play avg. off. height ≥ 186 cm ∧ play duration < 6 s ∧ Goalkeeper leaving line = False	0.59 = (887/1515)	0.22 = (777/3459)	0.37
5	Short variation	play avg. off. height ≥ 177 cm ∧ preparation time < 19 s ∧ num. duels ≥ 1	0.5 = (13/26)	0.13 = (12/95)	0.37
6	Short	play avg. off. height ≥ 174 cm ∧ play duration < 8 s ∧ num. duels ≥ 1	0.55 = (40/73)	0.21 = (45/217)	0.34

Table 9. Support difference of the most favorable conditions for tactic application when evaluated across other tactics.

CPID	Far Post	Rebound	Near Post	Direct	Short Variation	Short
1	0.64	0.07	0.02	0.04	0.05	0.06
2	0.06	0.57	0.10	0.01	0.06	0.06
3	0.05	0.04	0.49	0.01	0.01	0.09
4	0.09	0.02	0.02	0.37	0.00	0.10
5	0.03	0.14	0.09	0.08	0.37	0.17
6	0.04	0.03	0.14	0.33	0.20	0.34

Table 10. Unfavorable conditions for tactic application.

CPID	Tactic	Contrast Pattern	Support
CPID	Tactic	Contrast Pattern	Success	Fail	Difference
7	Near post	play avg. off. market val. ≤ 31.15 M∈ ∧ goal difference < 2 ∧ def. goalkeeper market val. ≤ 33.5 M∈	0.42 = (11/26)	0.83 = (58/70)	0.41
8	Direct	(178 cm ≤ play avg. off. height < 186 cm) ∧ num. duels < 2	0.12 = (178/1515)	0.51 = (1760/3459)	0.39
9	Short	team avg. def. market val. > 2.15 M∈ ∧ team avg. def. age ≥ 25 years ∧ num. duels < 2	0.18 = (13/73)	0.52 = (113/217)	0.34

Table 11. Support difference of the most unfavorable conditions for tactic application when evaluated across other tactics.

CPID	Far Post	Rebound	Near Post	Direct	Short Variation	Short
7	0.12	0.12	0.41	0.01	0.14	0.04
8	0.03	0.02	0.11	0.39	0.03	0.14
9	0.05	0.01	0.00	0.26	0.25	0.34

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muñoz, O.; Monroy, R.; Cañete-Sifuentes, L.; Ramirez-Marquez, J.E. Automated Discovery of Successful Strategies in Association Football. Appl. Sci. 2024, 14, 1403. https://doi.org/10.3390/app14041403

AMA Style

Muñoz O, Monroy R, Cañete-Sifuentes L, Ramirez-Marquez JE. Automated Discovery of Successful Strategies in Association Football. Applied Sciences. 2024; 14(4):1403. https://doi.org/10.3390/app14041403

Chicago/Turabian Style

Muñoz, Omar, Raúl Monroy, Leonardo Cañete-Sifuentes, and Jose E. Ramirez-Marquez. 2024. "Automated Discovery of Successful Strategies in Association Football" Applied Sciences 14, no. 4: 1403. https://doi.org/10.3390/app14041403

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Discovery of Successful Strategies in Association Football

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions and Outline

2. Materials and Methods

2.1. Preparing and Representing Corner Kick Event Data

2.1.1. Play Extraction and Preprocessing

2.1.2. Abstract Representation for Corner Kick Plays

2.1.3. Symbolic Representation for Corner Kick Plays

2.2. Discovery of Corner Kick Tactics

2.2.1. Algorithm Selection

2.2.2. Discovery of Tactics with Sequitur

2.2.3. Identifying Relevant Tactics

2.2.4. Play Compression

2.3. Discovery of Corner Kick Strategies

2.3.1. Algorithm Selection

2.3.2. Contrast Pattern Mining with PBC4cip

2.3.3. Pattern Filtering

2.3.4. Pattern Selection

3. Results

3.1. Discovery of Tactics with Sequitur

3.1.1. Relevant Tactics

3.1.2. Play Compression

3.2. Discovery of Strategies with PBC4cip

3.2.1. Favorable Conditions for Tactic Application

3.2.2. Description of Favorable Conditions

3.2.3. Unfavorable Conditions for Tactic Application

3.2.4. Description of Unfavorable Conditions

4. Discussion

5. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Contextual Information

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI