Applying AI & Data Science to MDL Case Processes

Motivation
Identifying emerging potential multidistrict litigation (MDL) cases proves an interesting, non-trivial task for law firms. Achieving this quickly allows law firms to gain a competitive edge over others – and we believe that automating this process into a data-driven approach is a logical next step for us to explore.
This article discusses our in-depth findings of scraping court docket data, as well as training a machine learning model to predict case outcomes.
Scope Disclaimer
MDL represents the highest-volume end of the U.S. mass-tort spectrum, where the Judicial Panel on Multidistrict Litigation (JPML) consolidates large numbers of related federal cases for coordinated pre-trial management. By design, this report focuses more on these large, court-managed proceedings, as their size and reach allows for rich, publicly-accessible docket data to be extracted and analysed easily at scale. Yet, many profitable mass-tort campaigns never reach the MDL threshold, and consequently the figures presented here should be interpreted as being representative of the entirety of the potential mass-tort market.
Data Mining
We used a multi-step process to obtain our dataset. This involved scraping the JPML website for terminated cases, utilising EXA Research to identify court dockets, and several data engineering steps to clean and process the data. Any missing docket metadata was scraped and filled in using in-house extraction agents.
Schema
Based on the data, we propose a model for an MDL case to be defined using the following stages:
# | Stage | What it covers |
1 | Initial transfer & coordination | JPML petition, transfer order, docket consolidation, case-tagging |
2 | Pre-trial proceedings | Master/short-form pleadings, Rule 12 motions, common discovery, Daubert |
3 | Bellwether trials | Representative MDL trials to gauge value and legal issues |
4 | Settlement negotiation | Global frameworks, court-supervised mediations, individual deals |
5 | Remand / individual resolution | Unsettled cases returned to original courts (or state courts) for further action |
6 | Appeals & post-MDL management | Interlocutory or final appeals, mandate returns, coordination after remand |
7 | Termination outcomes | |
7a | Trial verdict entered | Jury/bench verdict → final judgment (record verdict direction: plaintiff or defendant) |
7b | Settlement finalised | Court-approved global settlement or stipulated dismissal with consideration; often follows bellwethers or mediation |
7c | Non-trial judgment | Merits resolved without trial – summary judgment (Rule 56) or default judgment when defendant fails to appear |
7d | Procedural dismissal / closure | Case ends without merits decision – Rule 12 dismissal, voluntary Rule 41 drop, administrative closure, or bankruptcy-based closure |
Classification Performance
We used GPT4.1 to classify the final state of the MDL cases, and trained a CatBoost model on predicting the following outcomes: Settlement(7b), Dismissal(7d), or Other. The model achieved a 73% accuracy, and correctly identified 100% of all Settlement cases and 76% of all Dismissal cases. There were no false positives (incorrect win predictions)– this demonstrates that the model is risk-averse in overestimating success, which is favourable in legal predictions.
metric | value |
CV macro‑F1 (best) | 0.585 |
Test accuracy | 0.729 |
Test macro‑Precision | 0.651 |
Test macro‑Recall | 0.649 |
Test macro‑F1 | 0.630 |
Takeaways from the Data
Distribution of MDL Outcomes and Durations
Based on the scraped data spanning 2005 to 2023, most MDLs terminate with either 7b: Settlement or 7d: Dismissal.
Other MDLs that terminated in interim stages were likely lacking discoverable docket data. We’ve classified them as ‘other’ for brevity.


All MDLs terminate within 20 years – with 2014 being the year with the longest median termination duration (7.7) and 2011 coming a close second (7.4).
The time from a motion filed to a court hearing takes just over 36 days
The wait between court hearing to leadership setup takes about half the time (18 days).
The average discovery duration is close to a year
The time from the court hearing to a bellwether trial takes about 1.3 years.

Bellwether Trials by Year and Nature of Suit (NOS)
2013 saw the biggest spike with 8 bellwether trials held that year.
2007 and 2017 were joint second with 6 bellwether trials.

A majority of bellwether trials were related to Personal Injury - Product Liability, and Antitrust, followed by Other.

Types of MDLs by Settlement/Dismissal
Interestingly, most Antitrust and Personal Injury-Product Liability MDLs ended in a successful settlement, whereas cases more likely to be dismissed were related to Patents or Retirement Security.

MDLs based on 28§1331: Federal Question and 15§1: Sherman Antitrust Act – Restraint of Trade were more likely to be settled.
MDLs under 28:1332 (Diversity Jurisdiction) show a ~1:1 settlement-to-dismissal ratio, typical for this stage where stronger claims are settled either voluntarily or by bellwether leverage, and weaker claims get dismissed.
MDL Phase Transitions:
Below is a Diagram that shows the proportions of how MDLs in the dataset progress over time.
Note that even after MDL settlements are finalised, a significant portion of them ~30% are still re-opened for negotiations, individual resolutions, and further appeals.
A small proportion (~15%) of cases are abruptly dismissed at the initial transfer stage.

Top 10 Case Types by Nature of Suit
A majority of case types were either Antitrust (39), Other (37), or Personal Injury and Product Liability (36).

Conclusion
We have demonstrated that a systematic approach can reveal patterns in consolidated MDL docket data. The data also reveals patterns in case progression, jurisdictional tendencies and how they affect the case resolution. While MDL proceedings represent only a small proportion of the wider mass-tort landscape, their scale and patterned approaches make them an ideal ground for predictive analytics in litigation. Our findings provide possible insights for proactive case selection and resource allocation in the context of law firms, which will become increasingly valuable for high-volume litigation.
Legal Tech
Law Firm
Law

