Back to Poslfit: Scrabble

Poslfit: Scrabble: .gcg File Format

Version 1.2 - 2019-05-12

The .gcg file format is a human-readable representation of a Scrabble game. It was developed by John Chew in August 2000, for use at the 2000 National Scrabble Championship in Providence, RI, USA, where an early version of his proprietary pgm.pl script converted .gcg files to a set of web pages where viewers could play through a game online. The very first game posted was both played and recorded by Cheryl Tyler and Nigel Richards at Board 1 in Round 1 at NSC 2000.

The file format is based on the ‘DOoM log’ format used since 1993 on PoslDOoM and later MarlDOoM, In June 2005 the DOoM software was modified to permit ‘live annotation’, where annotators enter games on laptops move by move as they are played, permitting near realtime update of web sites, and eliminating the need to compare, check and key in annotation sheets at the end of a game. The first game to air live was played by Adam Logan and Joel Wapnick in Round 6 at Board 1 at the 2005 Canadian National Scrabble Championship in Toronto, Canada.

By convention, each web game created by pgm.pl includes an invisible but accessible file called input.gcg.

The file consists of a sequence of lines. The lines should be terminated by linefeeds (\012), but parsers should accept any sequence of carriage returns (\015) and line feeds as a line terminator. Lines consist of a sequence of tokens separated by white space (\011 or \040). Tokens should use printable ISO 8859-1 characters, white space may be used liberally for readability.

The file consists of pragma lines giving meta-information about the game, and event lines describing events which changed the state of the game.

Pragmata

Each pragma consists of a '#', a keyword indicating the type of pragma, white space, and a list of white-space-separated arguments.

ExampleDescription
#player1 Jamie James ChewSpecifies a nickname and full name for the first player. The first token is the nickname, the rest are the full name. Must preceded any event lines.
#player2 John John ChewSpecifies a nickname and full name for the second player. The first token is the nickname, the rest are the full name. Must preceded any event lines.
#title Sample GameGives a short window title for the game. Should not contain special HTML characters or entities. Must precede any event lines.
#description This is a sample game.Gives a long description of the game. May contain HTML entities. Must precede any event lines.
#incomplete ABCDEFGSpecifies the rack of the on-turn player. This may be deprecated in future.
#rack1 ABCDEFGGives the first player's rack after the most recently preceding event line. Overrides #incomplete. May be used to specify racks in a game in progress or to explicitly indicate exceptional situations such as the loss following an overdraw ruling of a tile known to be on the previous rack.
#rack2 RETAIN?Gives the second player's rack after the most recently preceding event line. Overrides #incomplete. May be used to specify racks in a game in progress or to explicitly indicate exceptional situations such as the loss following an overdraw ruling of a tile known to be on the previous rack.
#note This is a note.Gives a note commenting on the most recently preceding event line. May contain HTML entities. May be used to comment on time remaining, or a scoring error.
#id identification-authority unique-identifierDeclares a unique identifier for the game. Neither field may contain white space. The identification authority should use the reverse domain naming convention (e.g. com.poslfit for an authority associated with the poslfit.com domain) and must not issue duplicate identifiers within their own domains.
#comment date="2006-06-02 13:05"; author="John Chew
  <jjchew@math.utoronto.ca">; text="\"Hello, world!\""
A more structured version of #note, which it may eventually obsolete. Date should be in ISO format.
#htargetObsolete, was used to give the URL of turn web pages. Is now handled in a per-event configuration file.
#ltargetObsolete, was used to give a relative path to turn web pages. Is now handled in a per-event configuration file.
#styleObsolete, was used to specify visual style of game presentation. Is now handled in a per-event configuration file.
#orientation numbered-rows|lettered-rows(Proposed 2010-04-29) Specifies whether rows should be numbered (and columns lettered) or vice versa. If unspecified, defaults to numbered-rows.
#tile L·L l·l(Proposed 2010-04-29) Declares a multicharacter tile’s upper and lower case encodings.
#lexicon OTCWL2(Proposed 2010-04-29, added 2019-05-12) Declares the lexicon in use, may currently be one of: CSW2007 CSW2012 CSW2015 CSW2019 ODS ODS2 ODS3 ODS4 ODS5 OSPD OSPD2 OSPD3 OSPD4 OTCWL OTCWL2 OTCWL2014 OTCWL2016 NWL2018. (Contact John Chew to add to the list.)

Events

An event line has the following white-space separated elements in the following order:

ExampleDescription
>David: ANTHER? n8 ANoTHER +73 416 Regular play: turn player's nickname in >...:, white space, rack from which play was made, coordinate (number first for horizontal plays), word formed (regular tiles in CAPS, blanks in lower case, even for previously placed blanks being played through), signed score for play, cumulative score after play.
>Randy: U - +0 380 Passed turn: turn player's nickname in >...:, white space, rack from which play was passed, -, +0, cumulative score after turn passed.
>Marlon: SEQSPO? -QO +0 268 Tile exchange: turn player's nickname in >...:, white space, rack from which tiles were exchanged exchanged tiles preceded by -, +0, cumulative score after tiles exchanged. If the tiles exchanged are unknown, indicate the number of tiles (1-7) instead. (As of version 1.3) If only some exchanged tiles are known, the unknown ones may be represented by _.
>Ron: MOULAGD -- -76 354 Withdrawal of challenged phoney: turn player's nickname in >...:, white space, rack from which play was made, --, negative of phoney score, cumulative score before phoney played.
>Joel: DROWNUG (challenge) +5 289 Bonus for challenged acceptable word: challenged player's nickname in >...:, white space, rack after challenged play, (challenge), signed bonus, cumulative score after bonus.
>Dave: (G) +4 539 Points scored for opponent's last rack: credited player's nickname in >...:, white space, scored tiles in (...), signed score for tiles, cumulative score after tiles.
>Pakorn: FWLI (FWLI) -10 426 Points lost for last rack (international rules): penalized player's nickname in >...:, white space, player's last rack, penalized tiles in (...), signed score for tiles, cumulative score after tiles.
>Pakorn: ISBALI (time) -10 409 Time penalty: penalized player's nickname in >...:, white space, player's last rack, (time), time penalty, cumulative score after penalty.

Multicharacter Tiles

Some languages have tiles that have more than one ISO 8859-1 letter on them, such as LL, NY or QU in Catalan, or KA KI KU KE KO hypothetically in Japanese. There are two ways of representing such tiles.

The recommended method is to declare each using a “#tile” pragma. Parsers should assume that any string representing a sequence of tiles consists of character sequences defined in such pragmata together with undeclared single character tiles. Strings should be parsed from left to right (no bidi support for now) for the longest possible matching tile encoding at each point, with no backtracking.

Alternately, if there are situations where the first method is infeasible (and such situations should be brought to John Chew’s attention), each pair of tiles should be separated by a | symbol, with an optional | at the beginning and/or end of the string. The presence of a | anywhere in a tile string indicates that a | appears between each pair of tiles.

So for example, the Catalan word GORIL·LA should be entered just this way, with a “#tile L·L l·l” pragma at the beginning of the file. (Without the pragma, the parser should interpret “L·L” as three tiles.) It could also be written |G|O|R|I|L·L|A| or G|O|R|I|L·L|A.

Future expansion

The following features may be added to the file format when there is demand.

Change Log

1.1. 2017-05-29
Wording clarification over previously unnumbered version
1.2. 2019-05-12
Confirmed #lexicon, added several new values
1.3. 2024-04-03
Added _ to represent unknown rack tile