COLIEE-2019 CALL FOR TASK PARTICIPATIONCompetition on Legal Information Extraction/Entailment (COLIEE)COLIEE-2019 Workshop: June, 21st 2019Run in association with the International Conference on Artificial Intelligence and Law (ICAIL) 2019COLIEE registration due: February, 26th 2019
Those who wish to use previous COLIEE data for a trial, please contact rabelo(at)ualberta.ca
Sponsored by Alberta Machine Intelligence Institute (AMII) University of Alberta National Institute of Informatics (NII) vLex Canada Ross Intelligence Intellicon | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download Call for ParticipationFour tasks are included in the 2019 competition: Tasks 1 and 2 are about the case law competition, and tasks 3 and 4 are about the statute law competition. Task 1 is a legal case retrieval task, and it involves reading a new case Q, and extracting supporting cases S1, S2, ..., Sn from the provided case law corpus, to support the decision for Q. Task 2 is the legal case entailment task, which involves the identification of a paragraph from existing cases that entails the decision of a new case. As in previous COLIEE competitions, Task 3 is to consider a yes/no legal question Q and retrieve relevant statutes from a database of Japanese civil code statutes; Task 4 is to confirm entailment of a yes/no answer from the retrieved civil code statutes. 1. Task Description1.1 (COLIEE Case Law Competition) Task 1:The Legal Case Retrieval TaskThis legal case competition focuses on two aspects of legal information processing related to a database of predominantly Federal Court of Canada case laws, provided by Compass Law.The legal case retrieval task involves reading a new case Q, and extracting supporting cases S1, S2, ... Sn for the decision of Q from the entire case law corpus. Through the document, we will call the supporting cases for the decision of a new case 'noticed cases'. 1.2 (COLIEE Case Law Competition) Task 2: The Legal Case Entailment taskThis task involves the identification of a paragraph from existing cases that entails the decision of a new case.Given a decision Q of a new case and a relevant case R, a specific paragraph that entails the decision Q needs to be identified. We confirmed that the answer paragraph can not be identified merely by information retrieval techniques using some examples. Because the case R is a relevant case to Q, many paragraphs in R can be relevant to Q regardless of entailment. This task requires one to identify a paragraph which entails the decision of Q, so a specific entailment method is required which compares the meaning of each paragraph in R and Q in this task. 1.3 (COLIEE Statute Law Competition) Task 3: The Statute Law Retrieval TaskThe COLIEE statute law competition focuses on two aspects of legal information processing related to answering yes/no questions from Japanese legal bar exams (the relevant data sets have been translated from Japanese to English).Task 3 of the legal question answering task involves reading a legal bar exam question Q, and extracting a subset of Japanese Civil Code Articles S1, S2,..., Sn from the entire Civil Code which are those appropriate for answering the question such that Entails(S1, S2, ..., Sn , Q) or Entails(S1, S2, ..., Sn , not Q). Given a question Q and the entire Civil Code Articles, we have to retrieve the set of "S1, S2, ..., Sn" as the answer of this track. 1.4 (COLIEE Statute Law competition) Task 4: The Legal Question Answering Data CorpusTask 4 of the legal question answering task involves the identification of an entailment relationship such thatEntails(S1, S2, ..., Sn , Q) or Entails(S1, S2, ..., Sn , not Q). Given a question Q, we have to retrieve relevant articles S1, S2, ..., Sn through phase one, and then we have to determine if the relevant articles entail "Q" or "not Q". The answer of this track is binary: "YES"("Q") or "NO"("not Q"). 2. Data Corpus2.1 Case Law Competition Data Corpus(Task 1 and Task 2)COLIEE-2019 data is drawn from an existing collection of predominantly Federal Court of Canada case law.Participants can choose which phase they will apply for, amongst the two sub-tasks as follows: 1) Task 1: legal information retrieval task. Input is an unseen legal case Q, and output should be relevant cases in the given legal case corpus that support the decision of the input case, which are 'noticed cases'. 2) Task 2: Recognizing entailment between the decision of a new case and a relevant case. Input is a decision paragraph from an unseen case and a relevant case. Output should be a specific paragraph from the relevant case, which entails the decision of the unseen case. 2.2 Statute Law Competition Data Corpus(Task 3 and Task 4)The corpus of legal questions is drawn from Japanese Legal Bar exams, and all the Japanese Civil Law articles are also provided (file format and access described below).Participants can choose which phase they will apply for, amongst the two sub-tasks as follows: 1) Task 3: Legal Information Retrieval Task. Input is a bar exam 'Yes/No' question and output should be relevant civil law articles. 2) Task 4: Recognizing Entailment between Law Articles and Queries. Input is a bar exam 'Yes/No' question. After retrieving relevant articles using your method, you have to determine 'Yes' or 'No' as the output. 3. Measuring the Competition Results3.1. Measuring the Case Law Competition Results (Tasks 1 and 2)For Tasks one and two, evaluation measure will be precision, recall and F-measure:(the number of retrieved cases(paragraphs) for all queries) , Recall = (the number of correctly retrieved cases(paragraphs) for all queries) (the number of relevant cases(paragraphs) for all queries) , F-measure = (2 x Precision x Recall) (Precision + Recall) In the evaluation of Task 1 and Task 2, We simply use micro-average (evaluation measure is calculated using the results of all queries) rather than macro-average (evaluation measure is calculated for each query and then take average). 3.2. Measuring the Statute Law Competition Results (Tasks 3 and 4)For Task 3, evaluation measure will be precision, recall and F2-measure (since IR process is pre-process to select candidate articles for providing candidates which will be used in the entailment process, we put emphasis on recall), and it is:(the number of retrieved articles for each query) , Recall = average of (the number of correctly retrieved articles for each query) (the number of relevant articles for each query) , F2-measure = (5 x Precision x Recall) (4 x Precision + Recall) In addition to the above evaluation measures, ordinal information retrieval measures such as Mean Average Precision and R-precision can be used for discussing the characteristics of the submission results. In COLIEE 2019, the method used to calculate the final evaluation score of all queries is macro-average (evaluation measure is calculated for each query and their average is used as the final evaluation measure) instead of micro-average (evaluation measure is calculated using results of all queries). For Task 4, the evaluation measure will be accuracy, with respect to whether the yes/no question was correctly confirmed: (the number of all queries) 4. Submission detailsParticipants are required to submit a paper on their method and experimental results. At least one of the authors of an accepted paper has to present the paper at the special COLIEE session of ICAIL 2019. The papers by the winners of each task in the competition will be further reviewed and considered for inclusion in the main ICAIL 2019 proceedings, so please write your papers with that in mind. In addition, we expect to raise some kind of cash prize for the winners in each category..Papers should conform to the standards set out at https://icail2019-cyberjustice.com/calls/call-papers/ and be submitted to the COLIEE 2019 EasyChair submission webpage. 5. ScheduleFeb 15, 2019 Dry run data release. 6. Details of Each Task6.1 Task 1 DetailsOur goal is to explore and evaluate legal document retrieval technologies that are both effective and reliable.The task investigates the performance of systems that search a set of case laws that support the unseen case law. The goal of the task is to return 'noticed cases' in the given collection to a query. We call a case is 'noticed' to a query iff the case supports the decision of the query case. In this task, the query case does not include the decision, because our goal is how accurately a machine can capture decision-supporting cases for a new case (with no decision yet). A corpus composed of Federal Court of Canada case laws will be provided. The process of executing the new query cases over the existing cases and generating the experimental runs should be entirely automatic. In training data, each query case will be given the pool of case laws, and the noticed cases in the pool are shown. In test data only query cases and a pool of case laws will be included with no noticed case information. There should be no human intervention at any stage, including modifications to your retrieval system motivated by an inspection of the test queries. You should not peek the test data before you submit your runs. At most three runs from each group will be assessed. The submission format and evaluation methods are described below. 6.2 Task 2 DetailsOur goal is to predict the decision of a new case by entailment from previous relevant cases.As a simpler version of predicting a decision, a decision of a new case and a noticed case will be given as a query. Then, your legal textual entailment system identifies which paragraph in the noticed case entails the decision, by comparing the meanings between queries and the paragraphs. The task investigates the performance of systems that identifies a paragraph that entails the decision of an unseen case. Training data consists of triples of a query, a noticed case, and a paragraph number of the noticed case by which the decision of the query is entailed. The process of executing the queries over the noticed cases and generating the experimental runs should be entirely automatic. Test data will include only queries and noticed cases, but no paragraph numbers. There should be no human intervention at any stage, including modifications to your retrieval system motivated by an inspection of the test queries. 'Decision', in this context, does not mean the final decision of a case, but rather a conclusion expressed by the judge which is entailed by one or more particular paragraphs from the noticed case. In our dataset, this information is packaged in a file named 'entailed_fragment.txt'. 6.3 Task 3 DetailsOur goal is to explore and evaluate legal document retrieval technologies that are both effective and reliable.The task investigates the performance of systems that search a static set of civil law articles using previously unseen queries. The goal of the task is to return relevant articles in the collection to a query. We call an article as "Relevant" to a query iff the query sentence can be answered Yes/No, entailed from the meaning of the article. If combining the meanings of more than one article (e.g., "A", "B", and "C") can answer a query sentence, then all the articles ("A", "B", and "C") are considered "Relevant". If a query can be answered by an article "D", and it can be also answered by another article "E" independently, we also consider both of the articles "D" and "E" are "Relevant". This task requires the retrieval of all the articles that are relevant to answering a query. Japanese civil law articles (English translation besides Japanese) will be provided, and training data consists of pairs of a query and relevant articles. The process of executing the queries over the articles and generating the experimental runs should be entirely automatic. Test data will include only queries but no relevant articles. There should be no human intervention at any stage, including modifications to your retrieval system motivated by an inspection of the queries. You should not materially modify your retrieval system between the time you downloaded the queries and the time you submit your runs. At most three runs from each group will be assessed. The submission format and evaluation methods are described below. 6.4 Task 4 DetailsOur goal is to construct Yes/No question answering systems for legal queries, by entailment from the relevant articles.If a 'Yes/No' legal bar exam question is given, your legal information retrieval system retrieves relevant Civil Law articles. Then, the task investigates the performance of systems that answer 'Yes' or 'No' to previously unseen queries by comparing the meanings between queries and your retrieved Civil Law articles. Training data consists of triples of a query, relevant article(s), a correct answer "Y" or "N". Test data will include only queries, but no 'Y/N' label, no relevant articles. There should be no human intervention at any stage, including modifications to your retrieval system motivated by an inspection of the queries. You should not materially modify your retrieval system between the time you downloaded the queries and the time you submit your runs. At most three runs for each group should be assessed. The submission format and evaluation methods are described below. 7. Corpus StructureThe structure of the test corpora is derived from a general XML representation developed for use in RITEVAL, one of the tasks of the NII Testbeds and Community for Information access Research (NTCIR) project, as described at the following URL:http://sites.google.com/site/ntcir11riteval/ The RITEVAL format was developed for the general sharing of information retrieval on a variety of domains. 7.1 Case Law Competition Corpus Structure (Tasks 1 and 2)The format of the COLIEE competition corpora derived from an NTCIR representation of confirmed relationships between questions and the cases, as in the following example:
The above is an example of Task 1 training data where the query case whose id is "t1-001" has 3 (IDs: 008, 045, 130) noticed cases out of 200 candidate cases. The test corpora will not include the <cases_noticed> tag. Out of the given candidate cases for each query, you will be required to retrieve the true noticed cases. The original query case files were edited. Fragments which could give away the answers in a straightforward manner were replaced by a 'FRAGMENT_SUPPRESSED' marker.
The above is an example of Task 2 training data, and a decision (ie, the entailed fragment) in the query was entailed from the paragraph id 013 in the given noticed case. The decision in the query does not mean the whole decision of the case. This is a decision for a part of the case, and a paragraph that supports this decision should be identified in the given noticed case. The test corpora will not include the <entailing_paragraphs> tag, and you are required to identify the paragraph number which entails the query decision. The original query case files were edited. Fragments which could give away the answers in a straightforward manner were replaced by a 'FRAGMENT_SUPPRESSED' marker. 7.2 Statute Law Competition Corpus Structure (Tasks 3 and 4)The format of the COLIEE competition corpora derived from an NTCIR representation of confirmed relationships between questions and the articles and cases relevant to answering the questions, as in the following example:
For the Tasks 3 and 4, the training data will be the same. The groups who participate only in the Task 3 can disregard the pair label. 8. Competition Results Submission Format8.1. Task 1For Task 1, a submission should consist of a single ASCII text file. Use a single space to separate columns, with three columns per line as follows:t1-1 18 univABC t1-1 45 univABC t1-1 130 univABC t1-2 433 univABC . . .where: 1. The first column is the query id. 2. The second column is the official case number of the retrieved case. 3. The third column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation. In this example of a submission, you can see that t1-1 has multiple relevant articles (18, 45 and 130). 8.2. Task 2For Task 2, a submission should consist of a single ASCII text file. Use a single space to separate columns, with three columns per line as follows:t2-1 13 univABC t2-2 37 univABC t2-2 2 univABC t2-3 8 univABC . . .where: 1. The first column is the query id. 2. The second column is the paragraph number which entails the decision. 3. The third column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation. A query can have multiple entailing paragraph numbers. 8.3. Task 3Submission format in Task 3 is the TREC eval format used in trec_eval program. Use a single space to separate columns, with six columns per line as follows:H21-5-3 Q0 213 1 0.8 univABC Where 1. The first column is the query id. 2. The second column is "iter" for trec_eval and not used in the evaluation. Information of the column will be ignored. But please write Q0 in this column. 3. The third column is the official article number of the retrieved article. 3. The fourth column is the rank of the the retrieved articles. 3. The fifth column is similarity value (float value) of the retrieved articles. 6. The sixth column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation. Please refer to the README file of the trec_eval.8.1.tar.gz for detailed explanation. Most significant difference between the previous submission format and new one is that it is necessary to provide ranked lists instead of simple answer sets. Maximum numbers of the documents for each query is limited to 100. It is also encouraged to submit ranked list results with 100 candidates for each query. Since such submissions have smaller precision values due to the large numbers of candidates, it may be inappropriate to compare ones with small numbers of candidates. In order to clarify these different types of submissions, please add suffix "-L" for the submission result file (e.g., When univABC is the results for the submission with limited numbers of candidates, please use univABC-L for the submission with large numbers of submission). 8.4. Task 4For Task 4, again a submission should consist of a single ASCII text file. Use as single space to separate columns as follows, with three columns per line as follows:H18-1-2 Y univABC H18-5-A N univABC H19-19-I Y univABC H21-5-3 N univABC . . .where: 1. and 3 as for Phase One, 2. "Y" or "N" indicating whether the Y/N question was confirmed to be true ("Y") by the relevant articles, or confirmed to be false ("N"). 9. Presentation Schedule09:00 AM - COLIEE 2019 Overview - Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano and Ken Satoh 09:30 AM - Task winners announcement - Organizing committee 09:40 AM - HUKB at COLIEE 2019 Information Retrieval Task - Utilization of metadata for relevant case retrieval - Masaharu Yoshioka and Zihao Song 10:05 AM - Threshold-Based Retrieval and Textual Entailment Detection on Legal Bar Exam Questions - Sabine Wehnert, Sayed Anisul Hoque, Wolfram Fenske and Gunter Saake 10:30 AM - Break 11:00 AM - Legal Information Retrieval with Generalized Language Models - Julien Rossi and Evangelos Kanoulas [CANCELLED] 11:25 AM - COLIEE Case Law Competition Task 1: The Legal Case Retrieval Task - Rajaa El Hamdani, Aurore Troussel and Claire Houvenagel 11:50 AM - A performance study on fine-tuned large language models in the Legal Case Entailment task - Hiroaki Yamada and Takenobu Tokunaga 12:15 PM - An approach to Statute Law Retrieval Task in COLIEE-2019 - Tran-Binh Dang, Thao Nguyen and Le-Minh Nguyen 12:40 PM - Lunch 01:50 PM - Question Answering System for Legal Bar Examination using Predicate Argument Structures focusing on Exceptions - Reina Hoshino, Naoki Kiyota and Yoshinobu Kano 02:15 PM - IITP@COLIEE 2019: Legal Information Retrieval using BM25 and BERT - Baban Gain, Dibyanayan Bandyopadhyay, Tanik Saikh and Asif Ekbal 02:40 PM - Retrieving Legal Cases with Vector Representations of Text - Guilherme Paulino-Passos and Francesca Toni 03:05 PM - Statutory entailment using similarity features and decomposable attention models - John Hudzina, Thomas Vacek, Kanika Madan, Tonya Custis and Frank Schilder 03:30 PM - Break 04:00 PM - Searching Relevant Articles for Legal Bar Exam by Doc2Vec and TF-IDF - Ryuji Hayashi and Yoshinobu Kano 04:25 PM - Textual entailment using word embeddings and linguistic similarity - Kanika Madan, John Hudzina, Thomas Vacek, Frank Schilder and Tonya Custis 04:50 PM - A Deep Learning Approach for Statute Law Entailment Task in COLIEE-2019 - Ha Thanh Nguyen, Vu Tran and Le Minh Nguyen [CHANGED TO 11:25AM] 10. Task winnersThe winners of each task are awarded a cash prize, courtesy or our sponsors Intellicon and Ross Intelligence. The list of winners is:
Questions and Further Informationrabelo(at)ualberta.caApplication DetailsPotential participants to COLIEE-2019 should respond to this call for participation by submitting an application. To apply, submit the application and memorandums of the following URLs to rabelo(at)ualberta.ca:
We will send an acknowledgement to the email address supplied in the form once we have processed the form. Previous COLIEE editionsCOLIEE 2018. Summary papers on the case law tasks and statute law tasks available.COLIEE 2017 COLIEE 2016 COLIEE 2015 COLIEE 2014 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Last updated: Feb, 2019 |