Adding Support for Arbitrary File Sizes to the Single UNIX Specification

Last Update: 20Mar96

Adding Support for Arbitrary File Sizes to the Single UNIX Specification
1.0 Overview
1.1 The Large File Problem
1.2 Requirements
1.3 Importance
1.4 Concepts
1.5 Changes and Additions
1.6 Conformance
2.0 Changes to the Single UNIX Specification
2.1 Changes to CAE Specification System Interface Definitions, Issue 4, Version 2
2.2 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2
2.2.1 Changes to System Interfaces
2.2.2 Changes to Headers
2.3 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2
3.0 Transitional Extensions to the Single UNIX Specification
3.1 Transitional Extensions to CAE Specification System Interfaces and Headers, Issue 4, Version 2
3.1.1 Transitional Extensions to System Interfaces
3.1.2 Transitional Extensions to Headers
3.2 Transitional Extensions to the mount Utility
3.3 Accessing the Extensions to the SUS

Appendix A: Rationale and Notes
A.1 Overview
A.1.1 Guiding Principles
A.1.2 Concepts
A.2 Changes to the Single UNIX Specification
A.2.1 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2
A.2.1.1 Changes to System Interfaces
A.2.2 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2
A.3 Transitional Extensions to the Single UNIX Specification
A.3.1 Transitional Extensions to CAE Specification System Interfaces and Headers, Issue 4, Version 2
A.3.1.1 Transitional Extensions to System Interfaces
A.3.1.2 Transitional Extensions to Headers
A.3.2 Accessing the Transitional Extensions to the SUS

Acknowledgements

Revision Information
23Feb96 Version 1.1
24Feb96 Version 1.2
01Mar96 Version 1.3
05Mar96 Version 1.4
20Mar96 Version 1.5

Acknowledgements

Even with the rise of 64-bit systems, the 32-bit operating system will be with us for a while yet. However, the need for interoperability with 64-bit systems, large applications, large databases, and cheap disks has created a market imperative for the UNIX industry: support large files on 32-bit systems. Most current UNIX systems support file sizes of at most 2^31-1 bytes. This is not enough for today's applications, which include files containing videos, sounds, images, and large databases. Today's 32-bit systems are quite capable of handling the computational needs of these applications, but they need to be able to support maximum file sizes that are many orders of magnitude larger.

This support must be compatible with the existing Single UNIX Specification, and provide a path to conformance with following versions. It must allow system vendors a cost effective approach to adding these features to their existing products, and provide system vendors, software vendors, and users with a clear path for future products. The independent software vendors (ISVs) listed below gathered in a set of meetings with the UNIX systems vendors to develop a common set of APIs and modifications to the Single UNIX Specification to allow support for large files. We called these meetings the Large File Summit. For details of the meetings, how the proposals were developed, and the ISV requirements document, see http://www.sas.com/standards/large.file.

This work is being sent to the X/Open Base System Working Group so they can consider the changes that are suggested for the next generation of the Single UNIX Specification.

The individuals who participated in the Large File Summit meetings and on-line discussions were:

Amdahl Corp.:  Dennis Chapman, John Haines
Convex Computer Corp.:  Mike Carl, Peter Poorman, Tom White
Cray Research, Inc.:  Rick Matthews
Data General Corp.:  Dean Herington
Digital Equipment Corp.:  Fred Glover, Ray Lanza, Peter Smith
Fujitsu:  Chris Seabrook
HAL Computer Systems, Inc.:  Prashant Dholakia, Howard Gayle,
     David H. Yamada
Hewlett-Packard Co.:  Larry Dwyer, Hal Prince
IBM Corp.:  Bill Baker, Mark Brown
MacNeal-Schwendler Corp.:  David Lombard
NCR:  Kevin Brasche, Shawn Shealy
NEC Systems Laboratory, Inc.:  Jeff Forys
Novell:  Bill Cox, John Kiger, Seth Rosenthal
NOVON Research Inc.:  Brian Boyle
Oracle:  Mark Johnson
Programmed Logic Corp.:  Tim Williams, Steve Rago
Pyramid Technology Corporation:  Ralph Campbell, Henry Robinson
SAS Institute Inc.:  Mark Cates, Leigh Ihnen, Tom Truscott,
     Kelly Wyatt
Sequent Computer Systems:  Gerrit Huizenga, Mike Spitzer
Siemens Nixdorf Inc.:  Ralf Nolting, Klaus Thon
Silicon Graphics:  Steve Cobb, Adam Sweeney
Stratus Computer Inc.:  Tony Luck 
Sun Microsystems, Inc.:  Steve Chessin
SunSoft Inc.:  Karen Barnes, Don Cragun, Karl Danz, Andy Roach,
     Glenn Skinner, Peter Van der Linden,
     Srinivasan Viswanathan
Sybase Inc.:  Marc Sugiyama
Syncsort Inc.:   Asokan
Tandem Computers:  David M. VomLehn
The Santa Cruz Operation, Inc.:  John Farley, Kurt Gollhardt,
     Art Herzog, Danielle Lahmani, Wen-Ling Lu, Dave Prosser
Unisoft:  Guy Hadland
Unisys Corp.:  Steve Beck, Bruce Jones, Scott Lurndal,
     Jim Soddy
UTG Inc.:  Michael Dortch, Mark Hatch, Larry Lytle
Veritas:  Craig Harmer, Michael Schmitz

Special thanks go to SAS Institute Inc., SunSoft, Silicon Graphics, and Convex Computer (now HP) for providing meeting rooms and logistics support. Hal Prince and Don Cragun provided technical guidance and kept us aware of the details. Mark Brown helped us understand how important it was to comply with existing standards. Bill Baker and Tom White worked hard typing early drafts and providing alternative ways to organize the document. Adam Sweeney and Howard Gayle kept us within reason. David VomLehn and Tom Truscott kept good notes and provided the minutes. Ray Lanza gave us rousing encouragement ("Just make everything 64 bits!!"). Mark Johnson quipped excellent summaries. Kelly Wyatt did the final edits and provided an excellent sanity check during the endgame. And special thanks go to Mark Hatch (now with Integrated Computer Solutions, Inc.) who organized the first meetings and got this effort going.

I really enjoyed participating and would like to express my gratitude to the members of the large file summit. In particular, I enjoyed participating with people who were so honestly motivated to make the right technical decisions. This was a great lesson in UNIX file system semantics and how the Open Systems Process works.

There are a couple of interesting features of this specification. First, it contains a method of supporting an industry wide transition to full 64-bit APIs. Second, it specifies a set of changes to the Single UNIX Specification that will allow unlimited file offsets. The transition includes a way to add 64-bit file indexing without breaking current compliance to standards, and allow software developers to migrate existing sources and binaries to systems that support 64-bit file indexing.

This document is the result of a collaborative process that was open to all participants. The efforts of those who participated will best be rewarded by having this work accepted and used. I believe that this specification is an example of how well the industry can work together to solve problems that affect our ability to produce products that compete in the market.

John Carl Zeigler, jcz@utg.org
VP Technology, UTG Inc.
Cary, NC

1.0 Overview

1.1 The Large File Problem

As UNIX systems have become increasingly powerful, a number of system vendors and UNIX independent software vendors have developed a requirement to access files that contain more information than can be addressed using a signed long integer. One possible solution could be to convert every program using files to a larger size for long integers, including the operating system. However, the work to do this is undesirable for many vendors. A number of major system vendors and users have been meeting at the "Large File Summit" (LFS) for over a year to develop a set of changes to the existing Single UNIX Specification (SUS) that allow both new and converted programs to address files of arbitrary sizes. This set of changes will be provided to X/Open for inclusion into the next version of the SUS. In addition, a set of transitional extensions intended to permit users to immediately implement large file support on typical 32-bit UNIX operating systems is proposed. Both the changes and transitional extensions and the rationale behind their definition is included in this document.

1.2 Requirements

The LFS has worked to develop a solution to the large file problem meeting the following requirements:

Be implementable at a reasonable cost: Several of the LFS members are leading efforts to develop and implement solutions. Results from their experiences have guided our decisions.
Protect existing programs: This proposal allows for protection of existing programs. Many of the solutions considered would have caused existing programs to fail unexpectedly and silently. This proposal has been carefully crafted to reduce this possibility.
Provide access to files much larger than 2 gigabytes on 32-bit operating systems: This is the requirement that first motivated the LFS activity. The proposed changes implement a solution that allows file size and related sizes to be uncoupled from the size of the C language data types chosen for an operating environment. As a result, systems conforming to the proposed changes to the SUS can support files of arbitrary sizes.
Be fully compliant to the SUS: Systems modified to support the proposed extensions can be configured to strictly conform to the existing SUS. These same systems will normally be configured to fully meet the proposed changes supporting arbitrary file sizes and remain compliant to the SUS with extensions. In addition, conforming systems can also support a transitional API extension designed to substantially reduce the difficulty of conversion to this proposed standard while remaining compliant to the existing SUS. This transitional interface is contained in section 3.0 Transitional Extensions to the Single UNIX Specification.
Provide an extension to the SUS: While the LFS would like to see this proposal included in the next version of the SUS, this specification provides extensions that system vendors and independent software vendors need to support this functionality in their current compliant products.

1.3 Importance

As noted earlier, several vendors have already begun or completed implementation because of substantial market pressures. Independent software vendors are already writing software dependent on large file functionality. Rapid inclusion into the SUS is necessary to avoid repeating the existing situation where over 20 different implementations of asynchronous I/O are available on various UNIX systems. The LFS has chosen design alternatives to facilitate the needed rapid process of standardization. We believe the proposed changes will substantially enhance the value of the next revision of the SUS if they are included.

1.4 Concepts

The proposed changes are motivated by a consistent implementation of a few very basic technical concepts.

Mixed sizes of off_t

During a period of transition from existing systems to systems able to support an arbitrarily large file size, most systems will need to support binaries with two or more sizes of the off_t data type (and related data types). This mixed off_t environment may occur on a system with an ABI that supports different sizes of off_t. It may occur on a system which has both a 64-bit and a 32-bit ABI. Finally, it may occur when using a distributed system where clients and servers have differing sizes of off_t. In effect, the period of transition will not end until we need 128-bit file sizes, requiring yet another transition! The proposed changes may also be used as a model for the 64 to 128-bit file size transition.

Offset maximum

Most, but unfortunately not all, of the numeric values in the SUS are protected by opaque type definitions. In theory this allows programs to use these types rather than the underlying C language data types to avoid issues like overflow. However, most existing code maps these opaque data types like off_t to long integers that can overflow for the values needed to represent the offsets possible in large files.

To protect existing binaries from arbitrarily large files, a new value (offset maximum) will be part of the open file description. An offset maximum is the largest offset that can be used as a file offset. Operations attempting to go beyond the offset maximum will return an error. The offset maximum is normally established as the size of the off_t "extended signed integral type" used by the program creating the file description.

The open() function and other interfaces establish the offset maximum for a file description, returning an error if the file size is larger than the offset maximum at the time of the call. Returning errors when the offset maximum is (or is likely to be) exceeded protects existing binaries effectively.

EOVERFLOW

In a system with binaries compiled to support different sizes of off_t, operations such as read() or write() can attempt to reach parts of a large file beyond the range of an off_t or other limit. The existing SUS does not define an error for this case. EOVERFLOW is an existing error type that must be added to a number of system interfaces to communicate the new error condition to applications.

Development models

In addition to supporting environments requiring mixed sizes of off_t, the LFS also considered the development model. To maintain older programs that have not been converted to support arbitrary file sizes, it is necessary to specify the size of off_t and related data types. Two compilation models and the means to control them are specified in section 3.3 Accessing the Extensions to the SUS. A new set of transitional extensions will probably be needed when the next jump to larger file sizes occurs. The changes specified for the SUS, however, are size neutral.

Selectable off_t: In this model, the size of off_t is specified at compile time, and the appropriate set of libraries, headers and data types is chosen during the compilation and linking process. All existing binaries default to an off_t the size of a long integer.
Explicit off_t: In this model, the size of off_t is specified during application design. The system interface specified explicitly uses an off_t of a particular length. On a 32-bit system, for example, use of open() implies an off_t of 32 bits and use of open64() implies an off64_t of 64 bits. While the model is very useful for supporting incremental conversions and writing system software, it is not directly supported in the SUS. A proposed set of transitional extensions is described in section 3.0 Transitional Extensions to the Single UNIX Specification. These transitional interfaces support only the 32-bit to 64-bit file offset transition.

1.5 Changes and Additions

The requirements and concepts defined above have been consistently and completely applied to the SUS to generate the changes and additions specified in sections 2.0 Changes to the Single UNIX Specification and 3.0 Transitional Extensions to the Single UNIX Specification. The changes are classified as:

Changes to System Interface Definitions

The terms extended signed integral type, extended unsigned integral type, offset maximum and saved resource limits have been defined.

Changes to System Interfaces and Headers

EOVERFLOW, EFBIG and EINVAL are added or updated wherever needed.

The open() and fcntl() functions have been changed to support the offset maximum.

The fseeko() and ftello() functions have been added because the existing fseek() and ftell() do not use the required opaque types.

Data types, declarations and symbolic constants were added to or changed in headers.

Changes to Commands and Utilities

Utilities needed to establish a minimally complete system that can support large files which require conversion are defined. A complete conversion is both expensive and unnecessary for effective use of large files.

Transitional Extensions

The proposed transitional extensions including interfaces, macros and data types have been defined.

1.6 Conformance

A conforming implementation will supply all the interfaces that are specified in 2.0 Changes to the Single UNIX Specification (except that implementations need not provide the asynchronous I/O interfaces: aio_read(), aio_write(), and lio_listio()) and will define _LFS_LARGEFILE to be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that provides asynchronous I/O interfaces and the extensions to them specified in 2.0 Changes to the Single UNIX Specification will define _LFS_ASYNCHRONOUS_IO to be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that provides the explicit 64-bit interfaces will provide at least those interfaces specified in 3.1.1.1.3 Other Interfaces, 3.1.1.2 fcntl(), 3.1.1.3 open(), and 3.1.2 Transitional Extensions to Headers (except that changes specified in 3.1.2.2 <aio.h> and 3.1.2.6 <stdio.h> need not be supported) and will define _LFS64_LARGEFILE to be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that defines _LFS64_LARGEFILE to be 1 and provides the explicit 64-bit interfaces for asynchronous I/O specified in 3.1.1.1.1 Asynchronous I/O Interfaces will define _LFS64_ASYNCHRONOUS_IO to be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that defines _LFS64_LARGEFILE to be 1 and provides the explicit 64-bit STDIO interfaces specified in 3.1.1.1.2 STDIO Interfaces and 3.1.2.6 <stdio.h> will define _LFS64_STDIO to be 1 (see 3.1.2.12 <unistd.h>).

2.0 Changes to the Single UNIX Specification

2.1 Changes to CAE Specification System Interface Definitions, Issue 4, Version 2

The following definitions will be added to System Interface Definitions, Chapter 2, Glossary:

extended signed integral type: a signed integral type or an implementation-specific type with similar properties.
extended unsigned integral type: an unsigned integral type or an implementation-specific type with similar properties.
offset maximum: an attribute of an open file description representing the largest value that can be used as a file offset.
saved resource limits: an attribute of a process that provides some flexibility in the handling of unrepresentable resource limits, as described in the exec family of functions and setrlimit().
(Note the attribute "resource limits" as used in the SUS is not defined.)

2.2 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2

2.2.1 Changes to System Interfaces

The following changes will be made to System Interfaces and Headers, Chapter 3, System Interfaces. The Asynchronous I/O interfaces (aio_read(), aio_write() and lio_listio()) should be included when POSIX.1b is added in a future revision to the SUS.

2.2.1.1 aio_read()

DESCRIPTION

For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.

ERRORS

The following is an additional condition which may be detected synchronously or asynchronously:

[EOVERFLOW]
The file is a regular file, aiocbp->aio_nbytes is greater than 0 and the starting offset in aiocbp->aio_offset is before the end-of-file and is at or beyond the offset maximum in the open file description associated with aiocbp->aio_fildes.
Note: This is a new error condition.

2.2.1.2 aio_write()

DESCRIPTION

For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.

ERRORS

The following is an additional condition which may be detected synchronously or asynchronously:

[EFBIG]
The file is a regular file, aiocbp->aio_nbytes is greater than 0 and the starting offset in aiocbp->aio_offset is at or beyond the offset maximum in the open file description associated with aiocbp->aio_fildes.
Note: This is an additional EFBIG error condition.

2.2.1.3 exec

DESCRIPTION

The saved resource limits in the new process image are set to be a copy of the process's corresponding hard and soft resource limits.

2.2.1.4 fclose(), fflush(), fputwc(), fputws(), fseek(), putwc(), putwchar()

ERRORS

These functions will fail if:

[EFBIG]
The file is a regular file and an attempt was made to write at or beyond the offset maximum associated with the corresponding stream.
Note: This is an additional EFBIG error condition.

2.2.1.5 fcntl()

DESCRIPTION

An unlock (F_UNLCK) request in which l_len is non-zero and the offset of the last byte of the requested segment is the maximum value for an object of type off_t, when the process has an existing lock in which l_len is 0 and which includes the last byte of the requested segment, will be treated as a request to unlock from the start of the requested segment with an l_len equal to 0. Otherwise an unlock (F_UNLCK) request will attempt to unlock only the requested segment.

ERRORS

The fcntl() function will fail if:

[EOVERFLOW]
One of the values to be returned cannot be represented correctly.
[EOVERFLOW]
The cmd argument is F_GETLK, F_SETLK or F_SETLKW and the smallest or, if l_len is non-zero, the largest, offset of any byte in the requested segment cannot be represented correctly in an object of type off_t.
Note: These are new error conditions.

2.2.1.6 fdopen()

DESCRIPTION

The fdopen() function will preserve the offset maximum previously set for the open file description corresponding to fildes.

2.2.1.7 fgetc(), fgets(), fgetwc(), fgetws(), fread(), fscanf(), getc(), getchar(), gets(), getw(), getwc(), getwchar(), scanf()

ERRORS

These functions will fail if data needs to be read and:

[EOVERFLOW]
The file is a regular file and an attempt was made to read at or beyond the offset maximum associated with the corresponding stream.
Note: This is a new error condition.

2.2.1.8 fgetpos()

ERRORS

The fgetpos() function will fail if:

[EOVERFLOW]
The current value of the file position cannot be represented correctly in an object of type fpos_t.
Note: This is a new error condition.

2.2.1.9 fopen(), freopen(), tmpfile()

DESCRIPTION

The largest value that can be represented correctly in an object of type off_t will be established as the offset maximum in the open file description.

ERRORS

The fopen() and freopen() functions will fail if:

[EOVERFLOW]
The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t.
Note: This is a new error condition.

2.2.1.10 fpathconf() and pathconf()

DESCRIPTION

  Variable          Value of name          Notes
  FILESIZEBITS      _PC_FILESIZEBITS       3,4

2.2.1.11 fprintf(), fputc(), fputs(), fwrite(), printf(), putc(), putchar(), puts(), putw(), vfprintf(), vprintf()

ERRORS

These functions will fail if either the stream is unbuffered or the stream's buffer needed to be flushed and:

[EFBIG]
The file is a regular file and an attempt was made to write at or beyond the offset maximum.
Note: This is an additional EFBIG error condition.

2.2.1.12 fseek()

ERRORS

The fseek() function will fail if:

[EOVERFLOW]
The resulting file offset would be a value which cannot be represented correctly in an object of type long.
Note: This is a new error condition.

2.2.1.13 fseeko()

DESCRIPTION

The fseeko() function is identical to the modified fseek() except that the offset argument is of type off_t and the EOVERFLOW error is changed as follows:

ERRORS

[EOVERFLOW]
The resulting file offset would be a value which cannot be represented correctly in an object of type off_t.
Note: This is a new function.

2.2.1.14 fstat(), lstat() and stat()

ERRORS

These functions will fail if:

[EOVERFLOW]
The file size in bytes or the number of blocks allocated to the file or the file serial number cannot be represented correctly in the structure pointed to by buf.
Note: This is an additional EOVERFLOW error condition.

2.2.1.15 fstatvfs() and statvfs()

ERRORS

These functions will fail if:

[EOVERFLOW]
One of the values to be returned cannot be represented correctly in the structure pointed to by buf.
Note: This is a new error condition.

2.2.1.16 ftell()

ERRORS

The ftell() function will fail if:

[EOVERFLOW]
The current file offset cannot be represented correctly in an object of type long.
Note: This is a new error condition.

2.2.1.17 ftello()

DESCRIPTION

The ftello() function is identical to the modified ftell() except that the return value is of type off_t and the EOVERFLOW error is changed as follows:

ERRORS

[EOVERFLOW]
The current file offset cannot be represented correctly in an object of type off_t.
Note: This is a new function.

2.2.1.18 ftruncate()

ERRORS

The ftruncate() function will fail if:

[EFBIG]
The file is a regular file and length is greater than the offset maximum established in the open file description associated with fildes.
Note: This is an additional EFBIG error condition.

2.2.1.19 getrlimit() and setrlimit()

DESCRIPTION

When using the getrlimit() function, if a resource limit can be represented correctly in an object of type rlim_t then its representation is returned; otherwise if the value of the resource limit is equal to that of the corresponding saved hard limit the value returned is RLIM_SAVED_MAX; otherwise the value returned is RLIM_SAVED_CUR.
When using the setrlimit() function, if the requested new limit is RLIM_INFINITY the new limit will be "no limit"; otherwise if the requested new limit is RLIM_SAVED_MAX the new limit will be the corresponding saved hard limit; otherwise if the requested new limit is RLIM_SAVED_CUR the new limit will be the corresponding saved soft limit; otherwise the new limit will be the requested value. In addition, if the corresponding saved limit can be represented correctly in an object of type rlim_t then it will be overwritten with the new limit.
The result of setting a limit to RLIM_SAVED_MAX or RLIM_SAVED_CUR is unspecified unless a previous call to getrlimit() returned that value as the soft or hard limit for the corresponding resource limit.
The determination of whether a limit can be correctly represented in an object of type rlim_t is implementation-dependent. For example, some implementations permit a limit whose value is greater than RLIM_INFINITY and others do not.
The exec family of functions also cause resource limits to be saved. (See 2.2.1.3 exec).

2.2.1.20 lio_listio()

DESCRIPTION

For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.

ERRORS

The following are additional error codes which may be set for each aiocb control block:

[EOVERFLOW]
The aiocbp->aio_lio_opcode is LIO_READ, the file is a regular file, aiocbp->aio_nbytes is greater than 0, and the aiocbp->aio_offset is before the end-of-file and is greater than or equal to the offset maximum in the open file description associated with aiocbp->aio_fildes.
[EFBIG]
The aiocbp->aio_lio_opcode is LIO_WRITE, the file is a regular file, aiocbp->aio_nbytes is greater than 0, and the aiocbp->aio_offset is greater than or equal to the offset maximum in the open file description associated with aiocbp->aio_fildes.
Note: These are additional EFBIG and EOVERFLOW error conditions.

2.2.1.21 lockf()

DESCRIPTION

An F_ULOCK request in which size is non-zero and the offset of the last byte of the requested section is the maximum value for an object of type off_t, when the process has an existing lock in which size is 0 and which includes the last byte of the requested section, will be treated as a request to unlock from the start of the requested section with a size equal to 0. Otherwise an F_ULOCK request will attempt to unlock only the requested section.

ERRORS

The lockf() function will fail if:

[EINVAL]
The function argument is not one of F_LOCK, F_TLOCK, F_TEST or F_ULOCK; or size plus the current file offset is less than 0.
[EOVERFLOW]
The offset of the first, or if size is not 0 then the last, byte in the requested section cannot be represented correctly in an object of type off_t.
Note: This is a clarification of the EINVAL error condition.
Note: EOVERFLOW is a new error condition.

2.2.1.22 lseek()

ERRORS

The lseek() function will fail if:

[EOVERFLOW]
The resulting file offset would be a value which cannot be represented correctly in an object of type off_t.
Note: This is a new error condition.

2.2.1.23 mmap()

ERRORS

The mmap() function will fail if:

[EOVERFLOW]
The file is a regular file and the value of off plus len exceeds the offset maximum established in the open file description associated with fildes.
Note: This is a new error condition.

2.2.1.24 open()

DESCRIPTION

The largest value that can be represented correctly in an object of type off_t will be established as the offset maximum in the open file description.

ERRORS

The open() function will fail if:

[EOVERFLOW]
The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t.
Note: This is a new error condition.

2.2.1.25 read() and readv()

DESCRIPTION

For regular files, no data transfer will occur past the offset maximum established in the open file description associated with fildes.

ERRORS

The read() and readv() functions will fail if:

[EOVERFLOW]
The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes.
Note: This is a new error condition.

2.2.1.26 readdir()

ERRORS

The readdir() function will fail if:

[EOVERFLOW]
One of the values in the structure to be returned cannot be represented correctly.
Note: This is a new error condition.

2.2.1.27 write() and writev()

DESCRIPTION

For regular files, no data transfer will occur past the offset maximum established in the open file description associated with fildes.

ERRORS

These functions will fail if:

[EFBIG]
The file is a regular file, nbyte is greater than 0 and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes.
Note: This is an additional EFBIG error condition.

2.2.2 Changes to Headers

The following changes will be made to System Interfaces and Headers, Chapter 4, Headers.

2.2.2.1 <limits.h>

The following symbolic constant is defined as a Pathname Variable Value:

Name             Description                Acceptable Value
FILESIZEBITS     Minimum number of bits             *
                 needed to represent,
                 as a signed integer
                 value, the maximum size
                 of a regular file
                 allowed in the
                 specified directory.

2.2.2.2 <stdio.h>

The following are declared as functions and may also be defined as macros:

int         fseeko(FILE *stream, off_t offset, int whence);
off_t       ftello(FILE *stream);

The type off_t is defined through typedef as described in <sys/types.h>.

2.2.2.3 <sys/resource.h>

The following symbolic constants are defined:

RLIM_SAVED_MAX     A value of type rlim_t indicating an
                   unrepresentable saved hard limit.
RLIM_SAVED_CUR     A value of type rlim_t indicating an
                   unrepresentable saved soft limit.

On implementations where all resource limits are representable in an object of type rlim_t, RLIM_SAVED_MAX and RLIM_SAVED_CUR need not be distinct from RLIM_INFINITY.

2.2.2.4 <sys/stat.h>

The type of st_blocks in the stat structure will be changed to:

blkcnt_t    st_blocks   number of blocks allocated for this
                        object.

2.2.2.5 <sys/statvfs.h>

The types of the fields below in the statvfs structure will be changed to:

fsblkcnt_t  f_blocks    total number of blocks in the file
                        system in units of f_frsize.
fsblkcnt_t  f_bfree     total number of free blocks.
fsblkcnt_t  f_bavail    number of free blocks available to
                        non-privileged process.
fsfilcnt_t  f_files     total number of file serial numbers.
fsfilcnt_t  f_ffree     total number of free file serial
                        numbers.
fsfilcnt_t  f_favail    number of free file serial numbers
                        available to non-privileged process.

2.2.2.6 <sys/types.h>

The following data types will be defined:

blkcnt_t                Used for file block counts.
fsblkcnt_t              Used for file system block counts.
fsfilcnt_t              Used for file system file counts.

The types blkcnt_t and off_t are defined as extended signed integral types.

The types fsblkcnt_t, fsfilcnt_t, and ino_t are defined as extended unsigned integral types.

2.2.2.7 <unistd.h>

The following symbolic constant is defined for pathconf():

_PC_FILESIZEBITS

2.3 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2

The following changes will be made to Commands and Utilities, Chapter 3, Utilities.

2.3.1 Considerations for Utilities in Support of Files of Arbitrary Size

Note: This is a new section and should be added to Commands and Utilities, Issue 4, Version 2, Chapter 3 after section 1.2.1, Symbolic Links.

The following utilities will support files of any size up to the maximum that can be created by the implementation. This support includes correct writing of file size related values (such as file sizes and offsets, line numbers, and block counts) and correct interpretation of command line arguments that contain such values.

basename   return non-directory portion of pathname
cat        concatenate and print files
cd         change working directory
chgrp      change file group ownership
chmod      change file modes
chown      change file ownership
cksum      write file checksums and sizes
cmp        compare two files
cp         copy files
dd         convert and copy a file
df         report free disk space
dirname    return directory portion of pathname
du         estimate file space usage
find       find files
ln         link files
ls         list directory contents
mkdir      make directories
mv         move files
pathchk    check pathnames
pwd        return working directory name
rm         remove directory entries
rmdir      remove directories
sh         shell, the standard command language interpreter
sum        print checksum and block or byte count of a file
test       evaluate expression
touch      change file access and modification times
ulimit     set or report file size limit

Exceptions to the requirement that utilities support files of any size up to the maximum are:

Utilities such as tar and cpio cannot support arbitrary file sizes due to limitations imposed by fixed file formats.
Uses of files as command scripts, or for configuration or control, are exempt. For example, it is not required that sh be able to read an arbitrarily large ".profile".
Shell input and output redirection are exempt. For example, it is not required that the redirections sum < file or echo foo > file succeed for an arbitrarily large existing file.

2.3.2 The sh Utility

DESCRIPTION:

Pathname expansion will not fail due to the size of a file.
Shell input and output redirections will have an implementation-specific offset maximum that will be established in the open file description.

2.3.3 The pax Utility

APPLICATION USAGE

The pax utility is not able to handle arbitrary file sizes. There is currently a proposal in ballot in IEEE Project 1003.2b to address this issue.

3.0 Transitional Extensions to the Single UNIX Specification

The interfaces, macros and data types in this section are explicitly 64-bit instances of the corresponding SUS and POSIX.1b interfaces, macros and data types. The function prototype and semantics of a transitional interface will be equivalent to those of the SUS version of the call. Version test macros announcing extensions to the SUS are also defined.

The transitional extensions in this section are intended to be temporary. While an application using this specification may be using non-POSIX conforming transitional extensions to operating system functions, this does not require that system vendors break their POSIX compliance. This specification is intended to be compatible with the standards. The transitional extensions are provided so that system vendors may define a common set of large file capable extensions to their current compliant systems without violating that compliance.

3.1 Transitional Extensions to CAE Specification System Interfaces and Headers, Issue 4, Version 2

3.1.1 Transitional Extensions to System Interfaces

3.1.1.1 64-bit Versions of Interfaces

The following interfaces are explicitly 64-bit versions of the corresponding Single UNIX Specification and POSIX.1b interfaces. There is no functional difference between these and the corresponding Single UNIX Specification and POSIX.1b interfaces.

3.1.1.1.1 Asynchronous I/O Interfaces

aio_cancel64()         aio_error64()  
aio_fsync64()          aio_read64()   
aio_return64()         aio_suspend64()
aio_write64()          lio_listio64()

3.1.1.1.2 STDIO Interfaces

fgetpos64()            fopen64()      
freopen64()            fseeko64()     
fsetpos64()            ftello64()         
tmpfile64()

3.1.1.1.3 Other Interfaces

creat64()             fstat64()      
fstatvfs64()          ftruncate64()
ftw64()               getrlimit64()
lockf64()             lseek64()
lstat64()             mmap64()
nftw64()              open64()
readdir64()           setrlimit64()
stat64()              statvfs64()
truncate64()

3.1.1.2 fcntl()

DESCRIPTION

The following additional value may be used in constructing oflag:

O_LARGEFILE
If set, the offset maximum in the open file description will be the largest value that can be represented correctly in an object of type off64_t.
The behavior of the following additional values is equivalent to the corresponding Single UNIX Specification value (FGETLK, FSETLK, FSETLKW), but they take a struct flock64 argument rather than a struct flock argument.

FGETLK64
FSETLK64
FSETLKW64

3.1.1.3 open()

DESCRIPTION

The following additional value may be used in constructing oflag:

O_LARGEFILE
If set, the offset maximum in the open file description will be the largest value that can be represented correctly in an object of type off64_t.

ERRORS

The open() function will fail if:

[EOVERFLOW]
The named file is a regular file and either O_LARGEFILE is not set and the size of the file cannot be represented correctly in an object of type off_t or O_LARGEFILE is set and the size of the file cannot be represented correctly in an object of type off64_t.

APPLICATION USAGE

Note that using open64() is equivalent to using open() with O_LARGEFILE set in oflag.

Note: For the transitional extensions these changes to open() are in place of the changes described in 2.2.1.24 open() relating to the changes to the SUS.

3.1.2 Transitional Extensions to Headers

The modifications to the headers in this section are necessary to implement the transitional extensions as described in 3.0 Transitional Extensions to the Single UNIX Specification.

3.1.2.1 64-bit Versions of Headers

In summary, the changes to the headers involve the following data types, structures and symbolic constants:

3.1.2.1.1 Data Types

blkcnt_t               fsblkcnt_t
fsfilcnt_t             fpos_t
ino_t                  off_t
rlim_t

3.1.2.1.2 Structures

struct dirent          struct flock
struct rlimit          struct stat
struct statvfs

3.1.2.1.3 Symbolic Constants

F_GETLK                F_SETLK
F_SETLKW               RLIM_INFINITY
RLIM_SAVED_MAX         RLIM_SAVED_CUR

3.1.2.2 <aio.h>

The aiocb64 structure is defined in the same way as the aiocb structure in the POSIX.1b with the exception of the following member:

off64_t        aio_offset

The following are declared as functions and may be defined as macros.

int     aio_read64(struct aiocb64 *aiocbp);
int     aio_write64(struct aiocb64 *aiocbp);
int     lio_listio64(int mode, struct aiocb64 *const list[],
            int nent, struct sigevent *sig);
int     aio_error64(const struct aiocb64 *aiocbp);
ssize_t aio_return64(struct aiocb64 *aiocbp);
int     aio_cancel64(int fildes, struct aiocb64 *aiocbp);
int     aio_suspend64(const struct aiocb64 *const list[],
            int nent, const struct timespec *timeout);
int     aio_fsync64(int op, struct aiocb64 *aiocbp);

3.1.2.3 <dirent.h>

The dirent64 structure is defined in the same way as the dirent structure in the Single UNIX Specification with the exception of the following member:

ino64_t       d_ino     file serial number.

The following is declared as a function and may also be defined as a macro:

struct dirent64 *readdir64(DIR *dirp);

3.1.2.4 <fcntl.h>

The flock64 structure is defined in the same way as the flock structure in the Single UNIX Specification with the exception of the following members:

off64_t       l_start relative offset in bytes.
off64_t       l_len   size.

Additional values for cmd used by fcntl():

F_GETLK64     Get record locking information using struct
              flock64.
F_SETLK64     Establish a record lock using struct flock64.
F_SETLKW64    Establish a record lock, blocking, using struct
              flock64.

An additional file status flag, used by open() and fcntl(), is defined:

O_LARGEFILE     The offset maximum in the open file description
                is the largest value that can be represented
                correctly in an object of type off64_t.

The following are declared as functions and may also be defined as macros:

int     creat64(const char *path, mode_t mode);
int     open64(const char *path, int oflag, ...);

3.1.2.5 <ftw.h>

The following are declared as functions and may also be defined as macros:

int ftw64(const char *path,
    int (*fn)(const char *, const struct stat64 *, int),
    int ndirs);
int nftw64(const char *path,
    int (*fn)(const char *, const struct stat64 *, int,
               struct FTW *),
    int depth, int flags);

3.1.2.6 <stdio.h>

The following data type is defined through typedef:

fpos64_t  Type containing all information needed to specify
          uniquely every position within a file in which the
          largest offset can be represented in an object of type
          off64_t.

The following are declared as functions and may also be defined as macros:

int       fgetpos64(FILE *stream, fpos64_t *pos);
FILE     *fopen64(const char *filename, const char *mode);
FILE     *freopen64(const char *filename, const char *mode,
               FILE *stream);
int       fseeko64(FILE *stream, off64_t offset, int whence);
int       fsetpos64(FILE *stream, const fpos64_t *pos);
off64_t   ftello64(FILE *stream);
FILE     *tmpfile64(void);

3.1.2.7 <sys/mman.h>

The following is declared as a function and may also be defined as a macro:

void     *mmap64(void *addr, size_t len, int prot, int flags,
                int fd, off64_t offset);

3.1.2.8 <sys/resource.h>

The following data type is defined through typedef:

rlim64_t    type used for limit values.

The type rlim64_t must be an extended unsigned arithmetic type that can represent correctly any non-negative value of an off64_t.

The following symbolic constants are defined:

RLIM64_INFINITY    A value of type rlim64_t indicating no limit.
RLIM64_SAVED_MAX   A value of type rlim64_t indicating an
                   unrepresentable saved hard limit.
RLIM64_SAVED_CUR   A value of type rlim64_t indicating an
                   unrepresentable saved soft limit.

On implementations where all resource limits are representable in an object of type rlim64_t, RLIM64_SAVED_MAX and RLIM64_SAVED_CUR need not be distinct from RLIM64_INFINITY.

The rlimit64 structure is defined in the same way as the rlimit structure in the Single UNIX Specification with the exception of the following members:

rlim64_t  rlim_cur      the current (soft) limit.
rlim64_t  rlim_max      the hard limit.

The following are declared as functions and may also be defined as macros:

int       getrlimit64(int resource, struct rlimit64 *rlp);
int       setrlimit64(int resource, const struct rlimit64 *rlp);

3.1.2.9 <sys/stat.h>

The stat64 structure is defined in the same way as the stat structure in the Single UNIX Specification with the exception of the following members:

ino64_t     st_ino      file serial number.
off64_t     st_size     file size in bytes.
blkcnt64_t  st_blocks   number of blocks allocated for this
                        object.

The following are declared as functions and may also be defined as macros:

int         fstat64(int fildes, struct stat64 *buf);
int         lstat64(const char *, struct stat64 *buf);
int         stat64(const char *, struct stat64 *buf);

3.1.2.10 <sys/statvfs.h>

The statvfs64 structure is defined in the same way as the statvfs structure in the Single UNIX Specification with the exception of the following members:

fsblkcnt64_t  f_blocks  total number of blocks in the file
                        system in units of f_frsize.
fsblkcnt64_t  f_bfree   total number of free blocks.
fsblkcnt64_t  f_bavail  number of free blocks available to
                        non-privileged process.
fsfilcnt64_t  f_files   total number of file serial numbers.
fsfilcnt64_t  f_ffree   total number of free file serial
                        numbers.
fsfilcnt64_t  f_favail  number of free file serial numbers
                        available to non-privileged process.

The following are declared as functions and may also be defined as macros:

int         statvfs64(const char *path, struct statvfs64 *buf);
int         fstatvfs64(int fildes, struct statvfs64 *buf);

3.1.2.11 <sys/types.h>

The following data types are defined through typedef:

blkcnt64_t      Used for file block counts.
fsblkcnt64_t    Used for file system block counts.
fsfilcnt64_t    Used for file system file counts.
ino64_t         Used for file serial numbers.
off64_t         Used for file sizes.

The types blkcnt64_t and off64_t are defined as extended signed integral types.

The types fsblkcnt64_t, fsfilcnt64_t, and ino64_t are defined as extended unsigned integral types.

3.1.2.12 <unistd.h>

The following are declared as functions and may also be defined as macros:

int         lockf64(int fildes, int function, off64_t size);
off64_t     lseek64(int fildes, off64_t offset, int whence);
int         ftruncate64(int fildes, off64_t length);
int         truncate64(const char *path, off64_t length);
   
Version Test Macros:
_LFS_LARGEFILE   is defined to be 1 if the implementation
                 supports the interfaces as specified in
                 2.2.1 Changes to System Interfaces
                 except that implementations need not provide
                 the asynchronous I/O interfaces: aio_read(),
                 aio_write(), and lio_listio().
_LFS_ASYNCHRONOUS_IO
                 is defined to be 1 if the implementation
                 supports the asynchronous IO interfaces:
                 aio_read(), aio_write(), and lio_listio() as
                 specified in 2.2.1 Changes to
                 System Interfaces.
_LFS64_ASYNCHRONOUS_IO
                 is defined to be 1 if the implementation
                 supports all the transitional extensions
                 listed in 3.1.1.1.1 Asynchronous I/O Interfaces
                 and 3.1.2.2 <aio.h>.
_LFS64_LARGEFILE is defined to be 1 if the implementation
                 supports all the transitional extensions
                 listed in 3.1.1.1.3 Other Interfaces,
                 3.1.1.2 fcntl(), 3.1.1.3 open() and
                 3.1.2 Transitional Extensions to Headers,
                 except changes specified in 3.1.2.2 <aio.h>
                 and 3.1.2.6 <stdio.h> need not be supported.
_LFS64_STDIO     is defined to be 1 if the implementation
                 supports all the transitional extensions
                 listed in 3.1.1.1.2 STDIO Interfaces
                 and 3.1.2.6 <stdio.h>.
                  
                 If _LFS64_STDIO is not defined to be 1 and the
                 underlying file description associated with
                 stream has O_LARGEFILE set then the behavior
                 of the Standard I/O functions is unspecified.
                
Constants for Functions:
    _CS_LFS_CFLAGS       for confstr().
    _CS_LFS_LDFLAGS      for confstr().
    _CS_LFS_LIBS         for confstr().
    _CS_LFS_LINTFLAGS    for confstr().

    _CS_LFS64_CFLAGS     for confstr().
    _CS_LFS64_LDFLAGS    for confstr().
    _CS_LFS64_LIBS       for confstr().
    _CS_LFS64_LINTFLAGS  for confstr().

3.2 Transitional Extensions to the mount Utility

3.2.1 Optional Additional Option for the mount utility

If the -o nolargefiles option is specified and is supported by the file system, then for the duration of the mount it is guaranteed that all regular files in the file system have a file size that will fit in the smallest object of type off_t supported by the system performing the mount. The mount will fail if there are any files in the file system not meeting this criterion.

If -o largefiles is specified then there is no such guarantee.

The default behavior is implementation-dependent.

3.3 Accessing the Extensions to the SUS

3.3.1 Compilation Environment - Visibility of Additions to the API

Applications which define the macro _LARGEFILE_SOURCE to be 1 before inclusion of any header will enable at least the functionality described in 2.0 Changes to the Single UNIX Specification on implementations that support these features. Implementations that support these features will define _LFS_LARGEFILE to be 1 in <unistd.h>, as described in 3.1.2.12 <unistd.h>.

3.3.2 Compilation Environment - Visibility of Transitional API

Applications which define the macro _LARGEFILE64_SOURCE to be 1 before inclusion of any header will enable at least the fseeko(), ftello() extensions to the SUS (see 2.2.1.13 fseeko(), 2.2.1.17 ftello() and 2.2.2.2 <stdio.h>) and the transitional extensions described in 3.1 Transitional Extensions to CAE Specification System Interfaces and Headers, Issue 4, Version 2 on implementations that support these features. Implementations that support these features will define _LFS64_LARGEFILE, _LFS64_ASYNCHRONOUS_IO and _LFS64_STDIO to be 1 in <unistd.h>, as described in 3.1.2.12 <unistd.h>.

3.3.3 Mixed API and Compile Environments Within a Single Process

It is permitted to use both the Single UNIX Specification and the transitional APIs within the same executable, including within the same source file, and to use both on the same file descriptor whether in the same process or in different processes (when an open file descriptor is passed or inherited).

3.3.4 Utilities: Optional Method for Specifying the Size of an off_t

For programs to take advantage of different environments, it is necessary to compile them for each particular environment. For programs to make use of the features described in this section they must be compiled with new compiler and linker options. The getconf utility called with the new arguments can be used to generate compiler and linker options.

Example 1:

An example of compiling a program with a "large" off_t and that uses fseeko() and ftello() and uses yacc:

   c89 -D_LARGEFILE_SOURCE     -o foo      \
        $(getconf LFS_CFLAGS)  y.tab.c b.o \
        $(getconf LFS_LDFLAGS)             \
        -ly $(getconf LFS_LIBS)

Example 2:

An example of compiling a program with a "large" off_t and that does not use fseeko() and ftello() and has no application specific libraries:

   c89  $(getconf LFS_CFLAGS)  a.c         \
        $(getconf LFS_LDFLAGS)             \
        $(getconf LFS_LIBS)

Example 3:

An example of compiling a program with a "default" off_t and that uses fseeko() and ftello():

   c89 -D_LARGEFILE_SOURCE     a.c

Example 4:

An example of compiling a program using transitional versions of SUS interfaces such as lseek64() and fopen64():

   c89  -D_LARGEFILE64_SOURCE              \
        $(getconf LFS64_CFLAGS)  a.c       \
        $(getconf LFS64_LDFLAGS)           \
        $(getconf LFS64_LIBS)

Example 5:

An example of running lint on a program with a "large" off_t:

   lint -D_LARGEFILE_SOURCE                \
        $(getconf LFS_LINTFLAGS) ...       \
        $(getconf LFS_LIBS)

Example 6: An example of running lint on a program using the transitional API:

   lint -D_LARGEFILE64_SOURCE              \
        $(getconf LFS64_LINTFLAGS) ...     \
        $(getconf LFS64_LIBS)

These examples show the need for the additional variables LFS_CFLAGS, LFS_LDFLAGS, LFS_LIBS, LFS_LINTFLAGS, LFS64_CFLAGS, LFS64_LDFLAGS, LFS64_LIBS and LFS64_LINTFLAGS to be reported by getconf.

Implementations may permit the linking of object files that are compiled with differing off_t environments. For example, an object module compiled with a 32-bit off_t can be linked with an object module compiled with a 64-bit off_t. In such a case, both 32-bit off_t and 64-bit off_t API calls may be used on the same file descriptor. Implementations may instead disallow this linking.

Appendix A: Rationale and Notes

In a mixed environment the size of an off_t (and other types) might differ from program to program, and in a transitional environment (see 3.0 Transitional Extensions to the Single UNIX Specification) it might differ even from routine to routine within a single program. Each specific use of an off_t has an invariant size that is determined by the compilation environment. This is referred to below as the size which is "in use".

A.1 Overview

A.1.1 Guiding Principles

A.1.1.1 "No Lies" Rule

An error will be returned whenever a function cannot return the correct result of an operation.

Returning a "lie" to allow for common uses of a function (e.g. use of stat() to determine if a file exists) could inadvertently cause a correctly written application to operate incorrectly.

It is conceivable that returning a "lie" could keep an incorrectly written application from malfunctioning in a way that creates a serious problem, but no such applications are known to exist. (Of course it would be easy to contrive one.)

PASC Interpretation reference 1003.1-90 #38 completed by the POSIX.1 interpretations committee confirms that POSIX.1 conforming implementations are not allowed to lie to applications. This interpretation explicitly states that if the file size will not fit in an object of type off_t, fstat() must fail. In addition, PASC Interpretation reference 1003.1-90 #75 went on to clarify that EOVERFLOW would be a legal extension to report this condition.

A.1.1.2 "Open Protection" Rule

An open() will fail if the size of the (regular) file cannot be represented correctly in an object of type off_t.

The size of file on which a program is able to operate is determined by the off_t in use for the open(). The open protection rule ensures that old binaries do not operate on files that are too large to handle correctly, and prevents the binaries from generating incorrect results or corrupting the data in the file.

An argument against open protection is that requiring opens to fail will break some binaries that would have worked perfectly well otherwise. For example, a cat program does a loop of open(), read()/write() pairs, and close() for each input file. This program would unnecessarily break due to open protection. But this "Let it Run" argument is flawed in that there is no known utility which fails due to open protection but would work "perfectly well" if only we "let it run". Real versions of the cat program use fstat() to determine whether the input and output files are the same, have a -n option (count newlines) which will fail on sufficiently large files and so on.

Another argument against open protection is that it is unnecessary because an error will be returned as soon as a function cannot return the correct result of an operation ("No Lies" rule). However, most programs check for the success of the open() call, but many do not check for overflow or error after lseek() and other calls. An audit of the standard utilities uncovered numerous examples.

An argument for open protection is that it increases the likelihood of an immediate and informative error message. The error message is likely to include the name of the file that could not be opened. It is much less likely that an lseek() error message will be as immediate or as informative. The delay in, or complete lack of, reporting such errors may result in "silent failure".

Another argument for open protection is that there are numerous plausible scenarios in which this rule avoids serious harm. It prevents typical implementations of the touch utility from truncating large files to 0 length (see A.2.1.1.4 creat()). It can prevent silent failure, which has been demonstrated to occur in at least one commercial data management system. With open protection a commercial backup/restore system will report errors on files that might otherwise result in a corrupted backup tape. It prevents typical implementations of dbm/ndbm from returning incorrect results from a database whose size exceeds the off_t in use for the dbm routines.

A.1.1.3 "Read/Write Limit" Rule

For regular files, no data transfer will occur past the offset maximum established in the open file description.

There are two separate issues for this rule, which are that there is an application-dependent limit on read() and write(), and that the limit is "the offset maximum established in the open file description". The second issue is deferred to A.1.2.1 Offset Maximum. The first issue, that there be an application-dependent limit, is considered here.

There are two assertions upon which many applications rely:

A file can be read until end-of-file and written until the file system is full or some other implementation limit is reached.
The current file offset can be stored correctly in an object of type off_t, and any file position that can be reached with read() and write can also be reached with lseek().

In a mixed off_t environment these assertions are true only for the largest supported size of off_t. An audit of typical applications revealed that most check return codes from read() and write() in order to guard against end-of-file, full file systems, and the like, but that most do not check for overflow of file offsets or errors returned by lseek(). This suggests that it is more important to maintain the truth of the second assertion. In order to maintain the second assertion, read() and write() must not be permitted to move the file offset past the largest offset representable by the application's off_t.

The write limit avoids the unintuitive situation in which a program could create a file too large for it to open (due to open protection). This could result in a serious problem. "Can you imagine the reaction of someone who has 1.9G of data, and all of a sudden, the DBMS can no longer open the file? I wouldn't want to be working in tech support that day."

An argument for the write limit is that it keeps a program from creating a file too large for it to handle properly. An argument for the read limit is that it is a simple way to cover the hole where a file grows after it is opened.

An argument for the read/write limit rule is that generating an error at this limit provides the earliest possible warning of an incompatibility problem that could result in lost or corrupted data if the application was to continue.

An argument against the read/write limit rule is that it results in unnecessary breakage of binaries that would have worked perfectly well otherwise. This is the "Let it Run" argument, but as noted earlier few if any such programs exist.

Another argument against the read/write limit rule is that implementing it is expensive and complex. But it has already been implemented and found not to be either expensive or complex (an analysis appears in A.1.2.1 Offset Maximum).

Another argument against the read/write limit rule is that it can result in a truncated log file record (hence corrupting the log file). But this truncation and corruption can also occur due to insufficient disk space or RLIMIT_FSIZE, and indeed the standards require that this occur.

Another argument against the read/write limit rule is that instead one can use the existing file size resource limit (RLIMIT_FSIZE). But this is not a useful defense in a mixed off_t environment because it unnecessarily restricts the size of files created by programs which support a larger off_t. The practical effect will be that use of RLIMIT_FSIZE in this way will inconvenience users and they will unlimit themselves and then there will be no write limit. So this is a false, although attractive, argument.

Another argument against the read/write limit rule is that instead there can be a mount option which limits the maximum size of a file created in the file system. But regardless of other merits for such an option, it does not provide a useful defense in a mixed off_t environment because it unnecessarily restricts the size of files created by programs which support a larger off_t. The practical effect will be that the system administrator will be pressured into remounting the file system with no limit and then there will be no write limit. So this is another false, although attractive, argument.

A.1.1.4 Holes in the Protection Mechanism

The following holes in the protection mechanism are discussed in other sections of this document:

While a "small" application has a file open another "large" application can extend the file (see A.1.2.1 Offset Maximum).
The fcntl() function may inadvertently clear O_LARGEFILE (see A.3.1.1.1 fcntl()).
The lseek() failure may result in corruption of log file or database (see A.2.1.1.6 fgetpos(), fseek(), ftell(), lseek()).
An open file description with a "large" offset maximum may be inherited by a "small" application (see A.1.2.2 Inheritance).

A.1.2 Concepts

A.1.2.1 Offset Maximum

The offset maximum is used to implement the read/write limit (see A.1.1.3 "Read/Write Limit" Rule). It is basically a hack to avoid the need to provide transitional versions of read()/write() and the numerous routines which call them (getchar(), putchar(), printf(), etc.). For consistency it also affects the semantics of ftruncate() and mmap().

The offset maximum is an unusual part of this specification as it is associated with the file description whereas in all other cases the limit is determined by the size of the type that is used for the call. But determining the latter for read/write would be extremely difficult in an environment in which a single process contains calls with differing sizes of off_t in use (this environment is not part of this section of the specification, but it is part of the transitional specification). In such an environment it would be necessary to determine the size of off_t for every function that might result in a read() or write(). That would include putchar(), fwrite(), fputs(), fprintf(), puts(), etc. The number of the routines that might potentially do a read() or write() is too large for such an implementation to be practical.

It is possible that while a "small" application has a file open another application with a larger off_t can extend the file beyond the size of the small application's off_t. This leads to a situation where the small application has a file descriptor which refers to a file too large for it to be able to process correctly. That is, open protection has been lost. The application will still have some protection due to "No Lies" and the "Read/Write Limit", but these are less effective protections. It is believed that this case is sufficiently unlikely that it may be safely ignored.

As an added protection, it has been suggested that all file calls should fail whenever the size of the file cannot be represented correctly in an object of type off_t. This would defend against the file growth scenario described above. But checking file size on each read/write might hurt performance in some cases and also it was not considered an important defense. It would also have the putchar(), fwrite(), etc. implementation problem.

It has been suggested that a file should not be permitted to be extended beyond the size of the smallest offset maximum in any open file description that refers to the file. It is believed that this is an unnecessary complication, cannot be enforced for some distributed file systems and applies only to a situation that it is believed may be safely ignored.

The value of the offset maximum in an open file description will not affect the semantics of operations related to other open file descriptions or of operations which create new open file descriptions, including other open file descriptions which refer to the same file.

An argument against offset maximum is that it is expensive and complex. But that is not the case. The only implementation that will matter for years is for 64-bit off_t which

can be implemented as a open file flag (O_LARGEFILE -- see 3.1.2.4 <fcntl.h>).
will require about 5 lines in headers (e.g. <sys/fcntl.h>).
will require about 0 lines to set it during a 64-bit open().
will require about 5 lines of code to check and enforce it in each of the kernel implementations of read() and write().
will require about 2 lines of code to display it in each of the programs which display file flags (e.g. pstat utility).

Documentation would add a dozen or so lines of text, but this part of the specification does not require such documentation.

A.1.2.1.1 Offset Maximum and the 2G-1 File Size Limit

On implementations where type off_t is a 32-bit two's complement integer, the maximum value that can be correctly represented in an object of type off_t is 2^31-1 (2G-1). Because of this, the maximum file size and maximum file offset of a small file are 2G-1, but the maximum offset of any byte contained in a small file is 2G-2. An illustration of the offsets (0, 1, ...) of a file, with the bytes (b, B and L) shown as small boxes and the offset shown as "^" is:

        <- "small" -> | <- "large" >-
    ----------   -----------------------   
    | b | b | ::: | b | B | L | L | L | :::
    ^---^---^-   -^---^---^---^---^---^-   
    0   1   2     2G  2G  2G
                  -2  -1

Although an lseek() can be done to the 2G-1 offset, a read() or write() cannot be performed at that position because when B (counting number 2G, but offset 2G-1) is read or written, the resulting pointer to the next offset address and the file size itself would overflow.

A.1.2.2 Inheritance

The offset maximum will be inherited via fork(), the exec family of functions, dup(), and fcntl() called with F_DUPFD, and its value will not be altered by them. The value of the offset maximum will not affect any semantics related to inheritance.

An application can inherit, via the exec family of functions, a file descriptor that is associated with a file whose size exceeds the largest value that can be represented correctly by the off_t that is in use by the application. An example is if a shell that was compiled with a 64-bit off_t does input or output redirection of a 10 gigabyte file and then executes a program which was compiled with a 32-bit off_t. In such a case the large file unaware application will function until attempting an operation from which the results cannot be correctly returned.

Most inherited files are due to shell redirection, the other cases are rare and typically under the complete control of a single application provider. The cases that are of primary concern are:

     old_binary < large_file

and

     old_binary > large_file

In these cases a pre-existing application binary, old_binary, is given a file descriptor to a file that it would not have been able to open for itself and would be able to read and write past the limit that would have been established by the open(). The concern is that the application will do something destructive or generate incorrect results since it is not expecting a file to be so large.

In comparison, consider the following cases:

     a.out | old_binary

and

     old_binary | a.out

There is no limit to the amount of data that may be passed through a pipe. In the first case the application named a.out may push more data through the pipe than can be contained in a small file. In the second case a.out may be willing to read more data than can be contained in a small file. If a pre-existing application binary has problems with inherited file descriptors that refer to large files then it is likely to have a pre-existing problem when using a pipe for large amounts of data. While it is true that the two sets of cases are not completely equivalent, the above examples show that pre-existing binaries have had the potential to see data streams larger than the amount of data that can be contained in a small file.

Another reason it is believed that the inheritance of file descriptors does not cause problems is that the majority of existing applications do not perform seek operations on standard input or standard output.

A.1.2.3 Non-Requirements

Open protection and the read/write limit apply only to regular files, and are not specified to apply to block or character special files such as raw disk partitions.

A.1.2.4 Non-Changes

The following are to clarify, not to change, existing practice: Different files may have different maximum permitted sizes even when they are on the same system, or are on the same type of file system, or are on the same file system. The maximum permitted file sizes are independent of the offset maximum. The maximum permitted file sizes do not have specified minimum or maximum values. Attempts to grow a file via write(), writev(), or truncate() may fail even when statvfs() reports that space is available.

A.1.2.5 NFS Quality of Implementation Issue

NFS does not fall within the confines of this specification since there are no relevant NFS interfaces. However, here are some suggestions for NFS implementations.

The NFS version 2 protocol is effectively a 32-bit application since it cannot handle file sizes larger than 2^31-1 bytes. Any attempt by an NFS V2 client to access a large file (read(), write(), stat(), etc.) should be rejected by the server since the server knows the file is large and knows the application (NFS V2) is not "large file aware". This test is trivial and requires no more performance penalty than the tests for any other file system type.

The NFS version 3 protocol is "large file aware" since it can handle file sizes up to 2^63-1 bytes. An NFS V3 server would handle all requests without change, even if the request involves a large file. It is up to the NFS V3 client code to determine if the application accessing a file is "large file aware" or not. This should be handled in the standard fashion in the OS on the client side machine using the attributes returned by the NFS operation or the cached file attributes. While this does not provide perfect protection or immediate detection of files that have grown beyond 2^31-1 bytes since being opened, it is no more broken than the rest of NFS. (See below for more discussion of cached file attributes).

This does not address the issue of NFS V3 clients that are not prepared to handle "large files". If they are carefully written and obey the NFS V3 protocol they should realize that files can be larger than 2^31-1 bytes and handle this condition appropriately, probably by failing the operation (they would know this when a stat(), read(), write(), etc. operation returned a file size larger than 2^31). However, there are probably NFS V3 clients that are not carefully written. We really can't do much about that.

Cached Attributes: with the NFS V3 protocol, clients are not required to cache the file attributes, and servers are not required to return the file attributes with each operation. If the file attributes are returned with each operation, it is easy to determine if the file has grown past the large file limit. If not, the cached attributes can be consulted.

If the client does not cache attributes, then it will either have to request the attributes from the server over the wire (adversely affecting performance) or assume the file has not grown in size since it was opened. This specification pretty much requires the client code to check the file size at open.

Because of the stateless nature of NFS, it is difficult to ensure that a large-file unaware application cannot operate on a file that has grown from small to large. This is for the same reasons that NFS cannot implement standard UNIX file semantics. However, it is easy to ensure that a large-file unaware application does not grow a small file to become large (since the offset and length of each write are determined at the client, the client can fail any operation where the offset plus length exceeds the small file limit). It is also easy to insure that a large-file unaware application does not read past the small file limit.

A.2 Changes to the Single UNIX Specification

A.2.1 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2

A.2.1.1 Changes to System Interfaces

A.2.1.1.1 Notes on Functions not Modified by this Proposal

The following functions do not require modification to meet the terms of this proposal:

aio_error(), aio_cancel(), aio_return() and aio_suspend(): No large file implications were identified for these functions.
aio_fsync(): It is possible that an aio_fsync() could try to write out file blocks that are beyond the offset maximum, just as fsync() could. There is no compelling reason for either to fail. Clearly, the original write request had to be within the offset maximum for the file description used. The aio_fsync() function will not enforce the offset maximum on the blocks which it writes out.
glob() and wordexp(): The subroutines that expand file name wild cards need to be large file capable.

A.2.1.1.2 aio_read()

The aio_read() function enforces the offset maximum rules for consistency with read() and readv().

A.2.1.1.3 aio_write()

The aio_write() function enforces the offset maximum rules for consistency with write() and writev().

A.2.1.1.4 creat()

The creat() function will fail if the named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t (see 2.2.1.24 open()). This offers protection from the following coding style:

     if (stat(path, ...) < 0) {
         /* assume file does not exist, so create it */
         if ((fd = creat(path, ...)) < 0) {
            /* print out error text */
         }
     }

In this example the stat() function is being used to determine the existence of a file. But if the file size cannot be represented correctly in an object of type off_t then stat() will fail (see 2.2.1.14 fstat(), lstat() and stat()) and if creat() did not then fail it would have the unintended effect of truncating the file to 0 length. Many applications and standard utilities have code similar to this example, including typical implementations of the touch utility.

A.2.1.1.5 fcntl() and lockf()

Unlock requests are sometimes "rounded to infinity" so that a process can create a whole-file lock and then successfully issue a request to clip off the beginning of the lock without leaving behind an unrepresentable lock. This is to avoid breaking any existing 32-bit applications which might happen to do this.

Several existing implementations of fcntl() permit locking the byte whose offset is the maximum value that can be represented correctly in a object of type off_t, even though write() cannot write to that offset. This specification permits that behavior.

The fcntl() function will fail if the cmd argument is F_GETLK and the first lock which blocks the lock description has a starting offset or length which cannot be represented correctly in an object of type off_t. Information about such a lock cannot be correctly returned.

Discussion of the semantics of fcntl() locks that cross the off_t boundary resulted in six competing proposals:

An unlock request fails if it would create an unrepresentable lock.
If any lock request includes the byte whose offset is the maximum value that fits in an off_t, then the request is equivalent to a request where l_len is 0 and l_start refers to the first byte of the affected area.
(proposal was dropped)
If l_len is 0 then the lock is through and including the maximum value of off_t (and not beyond).
Just no lies.
If an unlock request includes the byte whose offset is the maximum value that fits in an off_t, and there is an existing lock with l_len equal to 0 which also includes that byte, then the request is equivalent to a request where l_len is 0 and l_start refers to the first byte of the affected area.

An advantage of 2, 4, and 6 is that they do not change existing behavior of a 32-bit application.

Proposals 1 and 5 can result in a new type of failure in the case where the program creates a lock with l_len equal to 0 and then clips off the beginning leaving behind an unrepresentable lock.

Proposal 4 precludes truly "whole file" locking.

Proposal 6 was adopted because as it preserves existing 32-bit behavior and is less disruptive than proposal 2 (which extends lock requests in addition to unlock requests).

The fcntl() and lockf() functions will fail if the offset of the first byte in the region, or if l_len (size) is non-zero then the offset of last byte in the region, exceeds the largest possible value in an object of type off_t. Otherwise the process could create a lock which would be "beyond" the ability of the program to represent.

A.2.1.1.6 fgetpos(), fseek(), ftell(), lseek()

These functions will fail if the resulting file offset would exceed the largest value that can be represented correctly in the related type which is in use for the call, and will set errno to EOVERFLOW (permitted by PASC Interpretation 1003.1-90 #75).

Programs typically, but incorrectly, fail to check the return value of these functions, which renders the error return less useful. On the other hand, returning an incorrect offset can result in serious malfunction as well.

An lseek() to the end of a file using

     lseek(fd, 0, SEEK_END);

is quite common. It is unfortunate that these fail on a too-large file since the return value is usually ignored. One alternative that was considered was for lseek() to move the file offset for all valid requests and then return an error if the resulting offset is too large. That is, the call would succeed for applications that do not check the return code, but also fail for applications that do check. This option was deemed too bizarre to adopt. For example, it might be difficult to implement using a remote procedure call system that was constructed to return either results or an error, but not both. In addition, the POSIX 1003.1 standard requires the file offset to remain unchanged if an error is returned by lseek(). It was felt that the open protection (see A.1.1.2 "Open Protection" Rule) and the read/write limit (see A.1.1.3 "Read/Write Limit" Rule) are more effective defenses against this problem.

Another potentially serious consequence of ignoring the return value of lseek() is that programs which extend data files by attempting to seek beyond the end-of-file and then writing may instead overwrite existing data.

For example, typical implementations of the dbm and ndbm libraries contain code such as:

     (void) lseek(db->dbm_pagf, blkno*PBLKSIZ, L_SET);
     if (write(db->dbm_pagf, pagebuf, PBLKSIZ) != PBLKSIZ)
                ... error handling ...

The problem is that the return code of lseek() is not checked and so if "blkno*PBLKSIZ" overflows the lseek() will fail (or will seek to an unintended offset) and the data will be written to an unintended offset.

A.2.1.1.7 fpathconf() and pathconf()

The reference "See Note 3,4" refers to notes in the X/Open specification for fpathconf() and pathconf(). These notes indicate that this option (_PC_FILESIZEBITS) is valid only for a directory, and the results are for files that exist or may be created in that directory.

The _PC_FILESIZEBITS option makes it possible for a process to determine how large a file can be created in a given directory. It takes into account implementation limitations in the file system (e.g. due to the size of file size and block count variables), and it takes into account long term policy limitations (e.g. due to the mount utility's -o nolargefiles option). It does not take into account dynamic restrictions such as the RLIM_FSIZE resource limit or the number of available file blocks, so the process must perform appropriate checks.

When the current directory is on a typical large file capable file system and is mounted with the -o nolargefiles option,

     pathconf(".", _PC_FILESIZEBITS);

will return 32. In general, if the maximum size file that could ever exist on the mounted file system is maxsize then the returned value is 2 plus the floor of the base 2 logarithm of maxsize.

A.2.1.1.8 fseeko() and ftello()

These functions are needed because fseek() and ftell() are limited by the long offset type required by ISO C. The fsetpos() and fgetpos() functions, although they do use an opaque offset type, are not complete replacements for fseek() and ftell() because they do not allow relative seeks or arithmetic on fpos_t values.

A.2.1.1.9 fsetpos()

Since fsetpos() sets an absolute file position, which is always legal regardless of the implementation-supported sizes of off_t, there are no new error returns or other new semantics.

A.2.1.1.10 fstatvfs() and statvfs()

These functions will fail if the total, or free, or available number of blocks or files cannot be represented correctly in the structure to be returned (f_blocks, f_bfree, f_bavail, f_files, f_ffree, f_favail).

A.2.1.1.11 ftruncate(), truncate(), unlink()

These functions are used only on pre-existing files and so do not have the potential programming hazard as does creat() (see A.2.1.1.4 creat()).

When ftruncate() is used to increase the size of a file, the semantics are similar to a write() of zeroes to the file. For consistency with write(), the ftruncate() function will fail when the request is beyond the offset maximum (even if the effect of the request would be to shorten the file).

A.2.1.1.12 ftw() and nftw()

The ftw() and nftw() functions may fail if a stat() in the underlying implementation fails with EOVERFLOW. This is unfortunate because "small" binaries using these functions cannot reasonably be used on file trees containing "large" files. Some systems have a non-standard extension to nftw() which permits it to continue when stat() fails (typical failures also include ESTALE and ELOOP).

A.2.1.1.13 getrlimit() and setrlimit()

These functions map limits that they cannot represent correctly to and from RLIM_SAVED_MAX and RLIM_SAVED_CUR. These values do not require any special handling by programs. They may be thought of as tokens that the kernel hands out to programs that can't handle the real answer, and that remind the kernel, when the tokens come back from the user, of what value is really meant.

If setrlimit() fails for any reason (for example, EPERM), the resource limits and saved resource limits remain unchanged.

This proposal does not specify any particular value for RLIM_INFINITY, RLIM_SAVED_MAX or RLIM_SAVED_CUR. Typical current implementations use the value 0x7FFFFFFF for RLIM_INFINITY, and it is recommended that RLIM_SAVED_MAX and RLIM_SAVED_CUR have similar large values.

Few, if any, programs will need to refer explicitly to RLIM_SAVED_MAX or RLIM_SAVED_CUR. Those that do should not use them in C-language switch cases since they may have the same value in some implementations (see 2.2.2.3 <sys/resource.h>).

A limit that can be represented correctly in an object of type rlim_t is either "no limit", which is represented with RLIM_INFINITY, or has a value not equal to any of RLIM_INFINITY or RLIM_SAVED_MAX or RLIM_SAVED_CUR and which can be represented correctly in an object of type rlim_t and which meets any additional implementation-specific criteria for correct representation.

A rejected alternative proposal was to map limits that could not be represented to and from RLIM_INFINITY. This would avoid the need for the new symbols RLIM_SAVED_MAX and RLIM_SAVED_CUR. But such mapping would arguably be a lie, and the resulting information loss would cause unintuitive program behavior, especially in programs running with appropriate privileges needed to raise hard limits.

A rejected alternative proposal was that if getrlimit() could not correctly return a current limit then it should instead return -1 and set errno to EOVERFLOW. But that would result in unnecessary breakage of programs. (Note that this breakage occurs even when no large files are present.) It would also result in malfunction of programs that assume that they are calling getrlimit() properly and so failure "cannot happen". For example, in the 4.4 BSD-Lite distribution, there are at least 15 unchecked calls to getrlimit(). When the 4.4 BSD csh limit function is used to report the current limits, there is no check of the return code and so the reported results can be entirely incorrect. Also, non-superuser programs typically unlimit themselves with:

     getrlimit(RLIMIT_STACK, &rl);
     rl.rlim_cur = rl.rlim_max;
     setrlimit(RLIMIT_STACK, &rl);

If the getrlimit() fails then garbage is passed to setrlimit() which may result in an unwanted and extremely restricted limit. Several utilities that are part of the GNU C compiler have this problem.

A.2.1.1.14 lio_listio()

The lio_list() function enforces the offset maximum rules since they are logically equivalent to aio_read() and aio_write() which enforce it.

A.2.1.1.15 mmap()

For consistency with read() and write(), the mmap() function will fail when the request extends beyond the offset maximum.

A.2.1.1.16 open()

The open() function called with O_TRUNC set will fail without truncation if the named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t. (See A.2.1.1.4 creat()).

A.2.1.1.17 read(), readv(), write() and writev()

These functions may do a "partial read or write" due to the offset maximum. That is, the value returned may be less than nbyte if the number of bytes remaining which may be transferred is less than nbyte.

A.2.1.1.18 ulimit()

The ulimit() function will return an unspecified result if the result cannot be represented correctly in an object of type long. As this function is already obsolescent, the use of getrlimit() and setrlimit() is recommended for getting and setting process limits.

A.2.2 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2

A.2.2.1 General Porting Suggestions

When porting a program to be large file capable, general areas of concern in addition to the issues mentioned in A.1.1.4 Holes in the Protection Mechanism include:

command line arguments
API conversion
type conversion
output formatting
fixed format media issues
other languages

A.2.2.1.1 Command Line Arguments

Numeric arguments which are file size related, such as a file offset or block count, need to be handled as an appropriately large type. Converting arguments into an off_t that is larger than a long may need to be accomplished with non-standard scanf() formats, if available, or with portable user-written functions that convert ASCII to a large off_t analogous to the strtol() function.

A.2.2.1.2 API Conversion

The program should be recompiled in a large off_t environment or, alternatively, should be converted to use the transitional API. In either case the source must be scanned for the functions listed in 3.1.1.1 64-bit Versions of Interfaces and the data types listed in 3.1.2.1 64-bit Versions of Headers to ensure that all types are properly converted.

A.2.2.1.3 Type Conversion

Whenever a new 64-bit function is used, the argument types and function result will need to be converted as appropriate. Whenever a variable's type is converted (whether via the large off_t compilation environment or the transitional API), all uses of the variable must be checked to determine if further type conversions are warranted. For example, wherever there is a struct stat, all uses of st_size must be checked. If the st_size value is assigned or compared with a variable "v" the variable "v" must be converted if necessary and all uses of "v" must in turn be checked. This is also true of type conversions required for command line arguments.

In addition, the program needs to be checked for file size related variables such as offsets, line numbers, and block counts that must be converted to a large off_t or related type. These variables typically appear inside loops that are performing input and/or output.

A.2.2.1.4 Output Formatting

Output of types that have been converted will probably involve using a different printf() format or using a revised user-written conversion routine. Since there is a larger range of values which take up more space, revision of the output layout may be required.

A.2.2.1.5 Fixed Format Media Issues

Current implementations of the tar and cpio utilities are defective in their support of arbitrarily large files. The pax utility is also equally defective, but is the subject of a proposal in ballot. (See 2.3.3 The pax Utility for discussion of this topic.)

Vendor and third-party backup software is also unable to support large files and will require modification in order to do so.

A.2.2.1.6 Other Languages

This specification is for the C language only. Other languages have different support requirements. For example, the Fortran I/O API has a limit on the number of records, not bytes.

A.2.2.2 Considerations for Utilities in Support of Files of Arbitrary Size

The utilities listed in 2.3.1 Considerations for Utilities in Support of Files of Arbitrary Size are utilities which are used to perform administrative tasks such as to create, move, copy, remove, change the permissions, or measure the resources of a file. They are useful both as end-user tools and as utilities invoked by applications during software installation and operation.

Typical core utilities must be compiled in a "large" off_t compilation environment or must use the transitional APIs. Using the compilation environment reduces the number of editing changes required to port a program, but it does not reduce the effort required to ensure the correctness of the port.

The chgrp, chmod, chown, ln, and rm utilities probably require use of large file capable versions of stat(), lstat(), ftw(), and the stat structure.

The cat, cksum, cmp, cp, dd, mv, sum, and touch utilities probably require use of large file capable versions of creat(), open(), and fopen().

The cat, cksum, cmp, dd, df, du, ls, and sum utilities may require writing large integer values. For example,

The cat utility might have a -n option which counts newlines.
The cksum and ls utilities report file sizes.
The cmp utility reports the line number at which the first difference occurs, and also has a -l option which reports file offsets.
The dd, df, du, ls, and sum utilities report block counts.

The dd, find and test utilities may need to interpret command arguments that contain 64-bit values. For dd the arguments include skip=n, seek=n, and count=n. For find the arguments include -size n. For test the arguments are those associated with algebraic comparisons.

The df utility might need to access large file systems with statvfs().

The ulimit utility will need to use large file capable versions of getrlimit() and setrlimit() and be able to read and write large integer values.

Conversion between off_t (or other derived types) and ASCII is unspecified, which is a significant practical deficiency. This is being considered by other groups. For example, see: ftp://ftp.dmk.com/DMK/sc22wg14/c9x/extended-integers/

A.2.2.3 Additional Requirements for the sh Utility - Porting Recommendations

Pathname expansion (e.g. expanding */foo.c to a/foo.c b/foo.c c/foo.c) and pathname completion might in some cases use the stat() function which would need to be large file capable.

The offset maximum used for shell input and output redirections is implementation-specific. Some vendors prefer to use the smallest supported off_t, others prefer the largest.

A.3 Transitional Extensions to the Single UNIX Specification

A.3.1 Transitional Extensions to CAE Specification System Interfaces and Headers, Issue 4, Version 2

Prior experience with transitional access is reported by SGI, Convex,
(http://www.sas.com/standards/large.file/background) and Programmed Logic Corporation
(http://www.sas.com/standards/large.file/proposals).

A.3.1.1 Transitional Extensions to System Interfaces

A.3.1.1.1 fcntl()

The O_LARGEFILE flag may be set or cleared with F_SETFL. An incorrectly written program may inadvertently clear this flag. For example, some programs put a file into append mode with:

      fcntl(fd, F_SETFL, O_APPEND);

This is incorrect because it turns off all the other open flags, including O_LARGEFILE. Instead, to turn on append mode one should first use F_GETFL to get the current flags:

     int oflag = fcntl(fd, F_GETFL, 0);

then include O_APPEND in the flags:

     oflag |= O_APPEND;

and then set the new flags:

     fcntl(fd, F_SETFL, oflag);

A more complete example would also check for fcntl() failures.

A.3.1.1.2 No fcntl64()

A rejected alternative to extending fcntl() with F_GETLK64 (and so on) would be to specify fcntl64() with F_GETLK (and so on). The former has prior art and less functional redundancy, whereas the latter is more consistent with other transitional functions. This specification does not preclude vendors from supplying an fcntl64().

A.3.1.2 Transitional Extensions to Headers

A.3.1.2.1 <aio.h>

The aio control block has an embedded offset which is of type off_t. A large file enabled aio control block needs a 64-bit offset. For consistency with the other transitional interfaces, a new control block with a 64-bit offset is defined. The offset is of the type off64_t.

Since a new control block is needed, new interfaces are required for all of the existing aio interfaces since every one takes a pointer to the control block as an argument.

A.3.1.2.2 <sys/resource.h>

This proposal does not specify any particular value for RLIM64_INFINITY, RLIM64_SAVED_MAX or RLIM64_SAVED_CUR. Typical implementations should use the value 0x7FFFFFFFFFFFFFFF or 0xFFFFFFFFFFFFFFFF for RLIM_INFINITY, and it is recommended that RLIM64_SAVED_MAX and RLIM64_SAVED_CUR have similar large values. Even though all limit values will be represented in 64-bit types for a few years, specifying them as distinct values now will reduce compatibility problems in the future when the next transition to a still larger type occurs.

A.3.1.2.3 <sys/types.h>

It is not required that ino64_t be a 64-bit type. However, the NFS version 3 protocol allows for 64-bit file serial numbers. For NFS interoperability with systems making use of 64-bit file serial numbers, 64-bit ino_t support is necessary. DCE also may make use of 64-bit file serial numbers.

A.3.2 Accessing the Transitional Extensions to the SUS

A.3.2.1 Compilation Environment - Visibility of Additions to the API

Applications which use the fseeko() and ftello() interfaces should define _LARGEFILE_SOURCE to be 1, then include <unistd.h> and then test that _LFS_LARGEFILE is 1 to determine if the additional functionality is indeed available. This additional functionality may be available even when _LARGEFILE_SOURCE is not defined, but it will not be available to strictly conforming X/Open programs.

This macro does not affect the size of off_t (see 3.3.3 Mixed API and Compile Environments Within a Single Process).

A.3.2.2 Visibility of Transitional API

Applications which wish to use this transitional functionality should define _LARGEFILE64_SOURCE to be 1, then include <unistd.h>, and then test that _LFS64_LARGEFILE, _LFS64_ASYNCHRONOUS_IO and _LFS64_STDIO are set to 1 to determine if the corresponding transitional functionality is indeed available. This transitional functionality may be available even when _LARGEFILE64_SOURCE is not defined, but it will not be available to strictly conforming X/Open programs.

This macro does not affect the size of off_t (see 3.3.3 Utilities: Optional Method for Specifying the Size of an off_t).

If _LARGEFILE64_SOURCE is defined then _LARGEFILE_SOURCE is implied so it need not also be defined (see 3.3.1 Compilation Environment - Visibility of Additions to the API). Similarly, if _LFS64_LARGEFILE is defined then _LFS_LARGEFILE will be defined so it need not also be tested.

A.3.2.3 Mixed API and Compile Environments within a Single Process

Mixing objects from differing compile environments can be dangerous, since some types have different sizes in the differing environments. The types might be used in a way where the size difference causes problems. A system may disallow this mixing. To avoid these problems, don't mix such objects in the same executable, or at least ensure that data shared between files compiled differently does not use any of the types whose meaning may change.

Mixing the standard and transitional APIs is relatively safe, since data types have the same meaning in every file. This mixing permits a smoother and faster migration to a larger off_t environment, because it permits asynchronous upgrades. For example, it permits libraries to be made large file aware without requiring large file awareness in all the programs which use the library or in all the libraries which the library uses. (This is true both for static and for shared libraries.) This is particularly beneficial for situations in which the system vendor, one or more third-party suppliers, and the end user may all be supplying libraries or other objects that are components of a complete program.

A.3.2.4 Utilities: Optional Method for Specifying the Size an off_t

The LFS_CFLAGS variable is used to obtain implementation- specific compiler options, such as flags and preprocessor variable definitions, so that the compiled program will be using a "large" off_t. Similarly the LFS_LDFLAGS variable supplies link editor options, the LFS_LIBS variable supplies link library names, and the LFS_LINTFLAGS variable supplies lint options.

If the size of off_t is controlled by a preprocessor macro variable then it is recommended that the macro be named _FILE_OFFSET_BITS and be supported as follows:

If this symbol is not defined then an implementation-defined default size will be used.
Otherwise, if this symbol has a decimal value equal to the number of bits in one of the implementation-supported sizes of off_t then that size of off_t will be used.
Otherwise, an error message will be written to the standard error and compilation will terminate with a non-zero status.

For POSIX compatibility this method must not be affected by the #undef preprocessor or directive. For example:

     #undef lseek

must not alter the size of type off_t in use for a call to lseek().

The functions that might be affected by this option are listed in 3.1.1.1 64-bit Versions of Interfaces.

The types, structures and symbolic constants that might be affected by this option are listed in 3.1.2.1 64-bit Versions of Headers.

It has been argued that there should be a new mode bit (or "magic number") on executable images to indicate whether or not the application is large file aware. This is not precluded by this specification. However, an argument against it is that it requires significant work. Specifically, kernel, compiler, loader, and library changes are needed. It is unclear how the mode bit would support a large file aware application that makes calls to a non-aware shared library.

Revision Information

23Feb96 Version 1.1

The 23Feb96 changes include:

Unix changed to UNIX throughout
Section 1.5 (Changes and Additions) second bullet (Changes to System Interfaces and Headers) added EFBIG
Section 2.2.1 (Changes to System Interfaces) changed "as a future" to "in a future".
Section 2.2.1.1 (aio_read), 2.2.1.1 (aio_write) and 2.2.1.20 (lio_listio) changed nbyte to aiocbp->aio_nbytes; added "is before the end-of-file and" before "is at or beyond" in the EOVERFLOW error.
Section 2.2.1.1 (aio_read), 2.2.1.1 (aio_write) and 2.2.1.20 (lio_listio) changed "greater than or equal to" to "greater than".
Section 2.2.1.4 (fclose, etc.), 2.2.1.7 (fgetc, etc.) and 2.2.1.11 (fprintf, etc.) changed "write beyond" to "write at or beyond".
Section 2.2.1.20 (lio_listio) prefixed lio_opcode with aiocbp->; changed order of phrases in EOVERFLOW and EFBIG (moved "the aiocbp->aio_lio_opcode is LIO_READ" to the front of the sentences); removed "before EOF" in the EOVERFLOW error condition; added "is before the end-of-file and" before "is greater than or equal to the offset maximum".
Section 2.2.2.6 (sys/types.h) and 3.1.2.11 (sys/types.h) changed "must be" to "are defined as" in the sentences starting "The types..".
Section 3.1.1.1 (64-bit Versions of SUS Interfaces) changed title of section to "64-bit Versions of Interfaces". Changed titles in references to match.
Section A.1.1.4.1 (fcntl) moved into A.3.1.1.1 (fcntl).
Section A.1.1.4 (Holes in the Protection Mechanism) body added.
Section A.1.2.1.1 (Offset Maximum and the 2G-1 File Size Limit) boldfaced "B" in "byte line"; changed "a lseek" to "an lseek"; changed "the resulting pointer to the next offset address will overflow" to "the resulting pointer to the next offset address and the file size itself would overflow"; changed title from "Offset Maximum - 2G-1 File Size Limit" to "Offset Maximum and the 2G-1 File Size Limit"; changed "cannot be performed because" to "cannot be performed at that position because" in the last paragraph.
Section A.2.1.1.4 (creat) changed sample code from if (creat(path, ...) < 0) { to if ((fd = creat(path, ...)) < 0) {.
Section A.2.1.1.6 (fgetpos, etc.) changed "this function" to "these functions" in second paragraph; added paragraph beginning "Another potentially serious..." and all that follows to the end of the section.
Section A.3.1.1 (Transitional Extensions...) changed "B.3.1.1.2" to "A.3.1.1.2" in subsection.
Section A.3.1.1.1 (fcntl) Merged sentence "The O_LARGEFILE flag may be set..." with the sentence "The O_LARGEFILE flag can expose..." moved in from A.1.1.4.1 (fcntl).
Section A.3.2.2 (Visibility of Transitional API) changed "Note that if" to "If" in fourth paragraph.
Section A.3.2.4 (Utilities:...) corrected reference to 3.1.2 in the second to the last paragraph to 3.1.2.1 64-bit Version of Headers.
Table of Contents corrected A.3 and A.3.1 heading titles.

24Feb96 Version 1.2

The 24Feb96 changes include:

Added link to Foreword and section.
Section 1.6 (Conformance) removed list, added text for section.
Section 2.2.1.11 (fprintf) changed "needs" to "needed" in the error text.
Section 3.1.2.12 (unistd.h) added LFS_ASYNCHRONOUS_IO version test macro.

01Mar96 Version 1.3

The 01Mar96 changes include:

Changed "Foreword" to "Acknowledgements".
Added body of Acknowledgements.
Section 1.6 (Conformance) 1st paragraph changed "may fail to" to "need not".
Section 3.3.4, Example 2 changed "had" to "has".
Section A.1.2.1.1 (Offset Maximum...) swapped "-" and ">" in top line.
Section A.2.1.1.4 (creat) corrected reference for fstat.
Section 3.3.3 (Utilities:...) corrected reference for Compilation Environment...

05Mar96 Version 1.4

The 05Mar96 changes include:

Changed Version 1.2 in 01Mar96 revision section to Version 1.3
Added additional contributors in the Acknowledgements.

20Mar96 Version 1.5

The 20Mar96 changes include:

Back by popular demand.... Larger fonts in the PostScript Version!
Section 1.2 (Requirements) In the text for "Be fully compliant to the SUS" changed "conversion to the proposed standard" to "conversion to this proposed standard" in the second from the last paragraph.
Section 1.4 (Concepts) Changed "file is larger" to "file size is larger" and changed "only support" to "support only".
Section 1.6 (Conformance) LOTS of changes. In summary: each statement of conformance ("A conforming implementation...") was separated into individual paragraph and in each the phrases "described in" and "listed in" were changed to "specified in"; the version test macro required for each statement of conformance was added along with a reference to the section where the changes to the interfaces and/or headers is described; in the first statement of conformance parenthesis were added around "except...lio_listio()" for clarity. Also deleted the last paragraph (beginning "Implementations which provide...").
Section 2.2.1.7 (fgetc) Removed extra period at end of EOVERFLOW description.
Section 2.2.1.19 (getrlimit) Changed commas before "otherwise" to semicolons in first and second paragraphs; changed "permit" to "might permit" and "do not" to "might not" in the fourth paragraph.
Section 3.0 (Transitional Extensions...) first paragraph: Added sentence beginning "Version test macros..." after the first sentence ("The interfaces...").
Section 3.1.2.8 (sys/resource.h) Added period after description of RLIM64_INFINITY.
Section 3.1.2.12 (unistd.h) In Version Test Macros section added to description of _LFS_ASYNCHRONOUS_IO beginning with "as specified in..."; added "and 3.1.2.2..." to description of _LFS64_ASYNCHRONOUS_IO; added "3.1.1.2 fcntl()..." to description of _LFS64_LARGEFILE; added "and 3.1.2.6..." to description of _LFS64_STDIO. The last paragraph of A.3.2.2 ("If _LFS64_STDIO...") was moved to 3.1.2.12 as a new paragraph in the description of _LFS64_STDIO. In the description of _LFS_LARGEFILE the phrase "the fseeko() and ftello()" was removed and the text beginning with "as specified in..." through the end of the sentence was added.
Section 3.2.1 (Optional Additional...) Changed criteria to criterion (last word of first paragraph).
Section A.1.1.2 (Open Protection...) Removed comma before "and so on" in the third paragraph.
Section A.1.2.1 (Offset Maximum) Added "it" between "that" and "is believed" in last sentence of the fifth paragraph. Also in the fifth paragraph, changed "only applies" to "applies only".
Section A.2.1.1.13 (getrlimit()...) Added text beginning "These values do not..." through "...is really meant." to the end of the first paragraph.
Section A.2.2.1 (General Porting...) In the first paragraph removed the phrase "there are four" and added "include" at then end of the sentence.
Section A.2.2.3 (Type Conversion...) Removed the last sentence of the last paragraph ("Utilities not directly...").
Section A.3.2.2 (Visibility of...) Second paragraph: added missing parenthesis at end of the sentence. Also moved last paragraph ("If _LFS64_STDIO is not defined...") to section 3.1.2.12 (unistd.h) as an additional paragraph in the _LFS64_STDIO description.
Acknowledgements first paragraph: changed "files sizes" to "file sizes" in two places and changed "at least 2**32-1" to "at most 2^31-1". In the list of contributors changed "Hewlett-Packard Inc." to "Hewlett-Packard Co."; changed "Sun Microsystems Corp." to "Sun Microsystems, Inc."; changed "Srimivasam" to "Srinivasan"; removed Art Herzog from Novell list; removed Carl Zeigler from SAS list; added The Santa Cruz Operation, Inc. contributors. Added "(now with Integrated Computer Solutions, Inc.)" after "Mark Hatch".
General: Changed "define[s,d] XXX as 1" to "define[s,d] XXX to be 1".