Last Update: 20Mar96
This support must be compatible with the existing Single UNIX Specification, and provide a path to conformance with following versions. It must allow system vendors a cost effective approach to adding these features to their existing products, and provide system vendors, software vendors, and users with a clear path for future products. The independent software vendors (ISVs) listed below gathered in a set of meetings with the UNIX systems vendors to develop a common set of APIs and modifications to the Single UNIX Specification to allow support for large files. We called these meetings the Large File Summit. For details of the meetings, how the proposals were developed, and the ISV requirements document, see http://www.sas.com/standards/large.file.
This work is being sent to the X/Open Base System Working Group so they can consider the changes that are suggested for the next generation of the Single UNIX Specification.
The individuals who participated in the Large File Summit meetings and on-line discussions were:
Amdahl Corp.: Dennis Chapman, John Haines Convex Computer Corp.: Mike Carl, Peter Poorman, Tom White Cray Research, Inc.: Rick Matthews Data General Corp.: Dean Herington Digital Equipment Corp.: Fred Glover, Ray Lanza, Peter Smith Fujitsu: Chris Seabrook HAL Computer Systems, Inc.: Prashant Dholakia, Howard Gayle, David H. Yamada Hewlett-Packard Co.: Larry Dwyer, Hal Prince IBM Corp.: Bill Baker, Mark Brown MacNeal-Schwendler Corp.: David Lombard NCR: Kevin Brasche, Shawn Shealy NEC Systems Laboratory, Inc.: Jeff Forys Novell: Bill Cox, John Kiger, Seth Rosenthal NOVON Research Inc.: Brian Boyle Oracle: Mark Johnson Programmed Logic Corp.: Tim Williams, Steve Rago Pyramid Technology Corporation: Ralph Campbell, Henry Robinson SAS Institute Inc.: Mark Cates, Leigh Ihnen, Tom Truscott, Kelly Wyatt Sequent Computer Systems: Gerrit Huizenga, Mike Spitzer Siemens Nixdorf Inc.: Ralf Nolting, Klaus Thon Silicon Graphics: Steve Cobb, Adam Sweeney Stratus Computer Inc.: Tony Luck Sun Microsystems, Inc.: Steve Chessin SunSoft Inc.: Karen Barnes, Don Cragun, Karl Danz, Andy Roach, Glenn Skinner, Peter Van der Linden, Srinivasan Viswanathan Sybase Inc.: Marc Sugiyama Syncsort Inc.: Asokan Tandem Computers: David M. VomLehn The Santa Cruz Operation, Inc.: John Farley, Kurt Gollhardt, Art Herzog, Danielle Lahmani, Wen-Ling Lu, Dave Prosser Unisoft: Guy Hadland Unisys Corp.: Steve Beck, Bruce Jones, Scott Lurndal, Jim Soddy UTG Inc.: Michael Dortch, Mark Hatch, Larry Lytle Veritas: Craig Harmer, Michael Schmitz
Special thanks go to SAS Institute Inc., SunSoft, Silicon Graphics, and Convex Computer (now HP) for providing meeting rooms and logistics support. Hal Prince and Don Cragun provided technical guidance and kept us aware of the details. Mark Brown helped us understand how important it was to comply with existing standards. Bill Baker and Tom White worked hard typing early drafts and providing alternative ways to organize the document. Adam Sweeney and Howard Gayle kept us within reason. David VomLehn and Tom Truscott kept good notes and provided the minutes. Ray Lanza gave us rousing encouragement ("Just make everything 64 bits!!"). Mark Johnson quipped excellent summaries. Kelly Wyatt did the final edits and provided an excellent sanity check during the endgame. And special thanks go to Mark Hatch (now with Integrated Computer Solutions, Inc.) who organized the first meetings and got this effort going.
I really enjoyed participating and would like to express my gratitude to the members of the large file summit. In particular, I enjoyed participating with people who were so honestly motivated to make the right technical decisions. This was a great lesson in UNIX file system semantics and how the Open Systems Process works.
There are a couple of interesting features of this specification. First, it contains a method of supporting an industry wide transition to full 64-bit APIs. Second, it specifies a set of changes to the Single UNIX Specification that will allow unlimited file offsets. The transition includes a way to add 64-bit file indexing without breaking current compliance to standards, and allow software developers to migrate existing sources and binaries to systems that support 64-bit file indexing.
This document is the result of a collaborative process that was open to all participants. The efforts of those who participated will best be rewarded by having this work accepted and used. I believe that this specification is an example of how well the industry can work together to solve problems that affect our ability to produce products that compete in the market.
John Carl Zeigler, jcz@utg.org
VP Technology, UTG Inc.
Cary, NC
To protect existing binaries from arbitrarily large files, a new value (offset maximum) will be part of the open file description. An offset maximum is the largest offset that can be used as a file offset. Operations attempting to go beyond the offset maximum will return an error. The offset maximum is normally established as the size of the off_t "extended signed integral type" used by the program creating the file description.
The open() function and other interfaces establish the offset maximum for a file description, returning an error if the file size is larger than the offset maximum at the time of the call. Returning errors when the offset maximum is (or is likely to be) exceeded protects existing binaries effectively.
The open() and fcntl() functions have been changed to support the offset maximum.
The fseeko() and ftello() functions have been added because the existing fseek() and ftell() do not use the required opaque types.
Data types, declarations and symbolic constants were added to or changed in headers.
A conforming implementation that provides asynchronous I/O interfaces and the extensions to them specified in 2.0 Changes to the Single UNIX Specification will define _LFS_ASYNCHRONOUS_IO to be 1 (see 3.1.2.12 <unistd.h>).
A conforming implementation that provides the explicit 64-bit interfaces will provide at least those interfaces specified in 3.1.1.1.3 Other Interfaces, 3.1.1.2 fcntl(), 3.1.1.3 open(), and 3.1.2 Transitional Extensions to Headers (except that changes specified in 3.1.2.2 <aio.h> and 3.1.2.6 <stdio.h> need not be supported) and will define _LFS64_LARGEFILE to be 1 (see 3.1.2.12 <unistd.h>).
A conforming implementation that defines _LFS64_LARGEFILE to be 1 and provides the explicit 64-bit interfaces for asynchronous I/O specified in 3.1.1.1.1 Asynchronous I/O Interfaces will define _LFS64_ASYNCHRONOUS_IO to be 1 (see 3.1.2.12 <unistd.h>).
A conforming implementation that defines _LFS64_LARGEFILE to be 1 and provides the explicit 64-bit STDIO interfaces specified in 3.1.1.1.2 STDIO Interfaces and 3.1.2.6 <stdio.h> will define _LFS64_STDIO to be 1 (see 3.1.2.12 <unistd.h>).
(Note the attribute "resource limits" as used in the SUS is not defined.)
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.ERRORS
The following is an additional condition which may be detected synchronously or asynchronously:Note: This is a new error condition.
- [EOVERFLOW]
- The file is a regular file, aiocbp->aio_nbytes is greater than 0 and the starting offset in aiocbp->aio_offset is before the end-of-file and is at or beyond the offset maximum in the open file description associated with aiocbp->aio_fildes.
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.ERRORS
The following is an additional condition which may be detected synchronously or asynchronously:Note: This is an additional EFBIG error condition.
- [EFBIG]
- The file is a regular file, aiocbp->aio_nbytes is greater than 0 and the starting offset in aiocbp->aio_offset is at or beyond the offset maximum in the open file description associated with aiocbp->aio_fildes.
The saved resource limits in the new process image are set to be a copy of the process's corresponding hard and soft resource limits.
These functions will fail if:Note: This is an additional EFBIG error condition.
- [EFBIG]
- The file is a regular file and an attempt was made to write at or beyond the offset maximum associated with the corresponding stream.
An unlock (F_UNLCK) request in which l_len is non-zero and the offset of the last byte of the requested segment is the maximum value for an object of type off_t, when the process has an existing lock in which l_len is 0 and which includes the last byte of the requested segment, will be treated as a request to unlock from the start of the requested segment with an l_len equal to 0. Otherwise an unlock (F_UNLCK) request will attempt to unlock only the requested segment.ERRORS
The fcntl() function will fail if:Note: These are new error conditions.
- [EOVERFLOW]
- One of the values to be returned cannot be represented correctly.
- [EOVERFLOW]
- The cmd argument is F_GETLK, F_SETLK or F_SETLKW and the smallest or, if l_len is non-zero, the largest, offset of any byte in the requested segment cannot be represented correctly in an object of type off_t.
The fdopen() function will preserve the offset maximum previously set for the open file description corresponding to fildes.
These functions will fail if data needs to be read and:Note: This is a new error condition.
- [EOVERFLOW]
- The file is a regular file and an attempt was made to read at or beyond the offset maximum associated with the corresponding stream.
The fgetpos() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The current value of the file position cannot be represented correctly in an object of type fpos_t.
The largest value that can be represented correctly in an object of type off_t will be established as the offset maximum in the open file description.ERRORS
The fopen() and freopen() functions will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t.
Variable Value of name Notes FILESIZEBITS _PC_FILESIZEBITS 3,4
These functions will fail if either the stream is unbuffered or the stream's buffer needed to be flushed and:Note: This is an additional EFBIG error condition.
- [EFBIG]
- The file is a regular file and an attempt was made to write at or beyond the offset maximum.
The fseek() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The resulting file offset would be a value which cannot be represented correctly in an object of type long.
The fseeko() function is identical to the modified fseek() except that the offset argument is of type off_t and the EOVERFLOW error is changed as follows:ERRORS
Note: This is a new function.
- [EOVERFLOW]
- The resulting file offset would be a value which cannot be represented correctly in an object of type off_t.
These functions will fail if:Note: This is an additional EOVERFLOW error condition.
- [EOVERFLOW]
- The file size in bytes or the number of blocks allocated to the file or the file serial number cannot be represented correctly in the structure pointed to by buf.
These functions will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- One of the values to be returned cannot be represented correctly in the structure pointed to by buf.
The ftell() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The current file offset cannot be represented correctly in an object of type long.
The ftello() function is identical to the modified ftell() except that the return value is of type off_t and the EOVERFLOW error is changed as follows:ERRORS
Note: This is a new function.
- [EOVERFLOW]
- The current file offset cannot be represented correctly in an object of type off_t.
The ftruncate() function will fail if:Note: This is an additional EFBIG error condition.
- [EFBIG]
- The file is a regular file and length is greater than the offset maximum established in the open file description associated with fildes.
When using the getrlimit() function, if a resource limit can be represented correctly in an object of type rlim_t then its representation is returned; otherwise if the value of the resource limit is equal to that of the corresponding saved hard limit the value returned is RLIM_SAVED_MAX; otherwise the value returned is RLIM_SAVED_CUR.When using the setrlimit() function, if the requested new limit is RLIM_INFINITY the new limit will be "no limit"; otherwise if the requested new limit is RLIM_SAVED_MAX the new limit will be the corresponding saved hard limit; otherwise if the requested new limit is RLIM_SAVED_CUR the new limit will be the corresponding saved soft limit; otherwise the new limit will be the requested value. In addition, if the corresponding saved limit can be represented correctly in an object of type rlim_t then it will be overwritten with the new limit.
The result of setting a limit to RLIM_SAVED_MAX or RLIM_SAVED_CUR is unspecified unless a previous call to getrlimit() returned that value as the soft or hard limit for the corresponding resource limit.
The determination of whether a limit can be correctly represented in an object of type rlim_t is implementation-dependent. For example, some implementations permit a limit whose value is greater than RLIM_INFINITY and others do not.
The exec family of functions also cause resource limits to be saved. (See 2.2.1.3 exec).
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.ERRORS
The following are additional error codes which may be set for each aiocb control block:Note: These are additional EFBIG and EOVERFLOW error conditions.
- [EOVERFLOW]
- The aiocbp->aio_lio_opcode is LIO_READ, the file is a regular file, aiocbp->aio_nbytes is greater than 0, and the aiocbp->aio_offset is before the end-of-file and is greater than or equal to the offset maximum in the open file description associated with aiocbp->aio_fildes.
- [EFBIG]
- The aiocbp->aio_lio_opcode is LIO_WRITE, the file is a regular file, aiocbp->aio_nbytes is greater than 0, and the aiocbp->aio_offset is greater than or equal to the offset maximum in the open file description associated with aiocbp->aio_fildes.
An F_ULOCK request in which size is non-zero and the offset of the last byte of the requested section is the maximum value for an object of type off_t, when the process has an existing lock in which size is 0 and which includes the last byte of the requested section, will be treated as a request to unlock from the start of the requested section with a size equal to 0. Otherwise an F_ULOCK request will attempt to unlock only the requested section.ERRORS
The lockf() function will fail if:Note: This is a clarification of the EINVAL error condition.
- [EINVAL]
- The function argument is not one of F_LOCK, F_TLOCK, F_TEST or F_ULOCK; or size plus the current file offset is less than 0.
- [EOVERFLOW]
- The offset of the first, or if size is not 0 then the last, byte in the requested section cannot be represented correctly in an object of type off_t.
Note: EOVERFLOW is a new error condition.
The lseek() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The resulting file offset would be a value which cannot be represented correctly in an object of type off_t.
The mmap() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The file is a regular file and the value of off plus len exceeds the offset maximum established in the open file description associated with fildes.
The largest value that can be represented correctly in an object of type off_t will be established as the offset maximum in the open file description.ERRORS
The open() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t.
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with fildes.ERRORS
The read() and readv() functions will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes.
The readdir() function will fail if:Note: This is a new error condition.
- [EOVERFLOW]
- One of the values in the structure to be returned cannot be represented correctly.
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with fildes.ERRORS
These functions will fail if:Note: This is an additional EFBIG error condition.
- [EFBIG]
- The file is a regular file, nbyte is greater than 0 and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes.
Name Description Acceptable Value FILESIZEBITS Minimum number of bits * needed to represent, as a signed integer value, the maximum size of a regular file allowed in the specified directory.
int fseeko(FILE *stream, off_t offset, int whence); off_t ftello(FILE *stream);The type off_t is defined through typedef as described in <sys/types.h>.
RLIM_SAVED_MAX A value of type rlim_t indicating an unrepresentable saved hard limit. RLIM_SAVED_CUR A value of type rlim_t indicating an unrepresentable saved soft limit.On implementations where all resource limits are representable in an object of type rlim_t, RLIM_SAVED_MAX and RLIM_SAVED_CUR need not be distinct from RLIM_INFINITY.
blkcnt_t st_blocks number of blocks allocated for this object.
fsblkcnt_t f_blocks total number of blocks in the file system in units of f_frsize. fsblkcnt_t f_bfree total number of free blocks. fsblkcnt_t f_bavail number of free blocks available to non-privileged process. fsfilcnt_t f_files total number of file serial numbers. fsfilcnt_t f_ffree total number of free file serial numbers. fsfilcnt_t f_favail number of free file serial numbers available to non-privileged process.
blkcnt_t Used for file block counts. fsblkcnt_t Used for file system block counts. fsfilcnt_t Used for file system file counts.
The types blkcnt_t and off_t are defined as extended signed integral types.
The types fsblkcnt_t, fsfilcnt_t, and ino_t are defined as extended unsigned integral types.
_PC_FILESIZEBITS
The following utilities will support files of any size up to the maximum that can be created by the implementation. This support includes correct writing of file size related values (such as file sizes and offsets, line numbers, and block counts) and correct interpretation of command line arguments that contain such values.
basename return non-directory portion of pathname cat concatenate and print files cd change working directory chgrp change file group ownership chmod change file modes chown change file ownership cksum write file checksums and sizes cmp compare two files cp copy files dd convert and copy a file df report free disk space dirname return directory portion of pathname du estimate file space usage find find files ln link files ls list directory contents mkdir make directories mv move files pathchk check pathnames pwd return working directory name rm remove directory entries rmdir remove directories sh shell, the standard command language interpreter sum print checksum and block or byte count of a file test evaluate expression touch change file access and modification times ulimit set or report file size limitExceptions to the requirement that utilities support files of any size up to the maximum are:
Pathname expansion will not fail due to the size of a file.Shell input and output redirections will have an implementation-specific offset maximum that will be established in the open file description.
The pax utility is not able to handle arbitrary file sizes. There is currently a proposal in ballot in IEEE Project 1003.2b to address this issue.
The transitional extensions in this section are intended to be temporary. While an application using this specification may be using non-POSIX conforming transitional extensions to operating system functions, this does not require that system vendors break their POSIX compliance. This specification is intended to be compatible with the standards. The transitional extensions are provided so that system vendors may define a common set of large file capable extensions to their current compliant systems without violating that compliance.
aio_cancel64() aio_error64() aio_fsync64() aio_read64() aio_return64() aio_suspend64() aio_write64() lio_listio64()
fgetpos64() fopen64() freopen64() fseeko64() fsetpos64() ftello64() tmpfile64()
creat64() fstat64() fstatvfs64() ftruncate64() ftw64() getrlimit64() lockf64() lseek64() lstat64() mmap64() nftw64() open64() readdir64() setrlimit64() stat64() statvfs64() truncate64()
The following additional value may be used in constructing oflag:The behavior of the following additional values is equivalent to the corresponding Single UNIX Specification value (FGETLK, FSETLK, FSETLKW), but they take a struct flock64 argument rather than a struct flock argument.
- O_LARGEFILE
- If set, the offset maximum in the open file description will be the largest value that can be represented correctly in an object of type off64_t.
- FGETLK64
- FSETLK64
- FSETLKW64
The following additional value may be used in constructing oflag:ERRORS
- O_LARGEFILE
- If set, the offset maximum in the open file description will be the largest value that can be represented correctly in an object of type off64_t.
The open() function will fail if:APPLICATION USAGE
- [EOVERFLOW]
- The named file is a regular file and either O_LARGEFILE is not set and the size of the file cannot be represented correctly in an object of type off_t or O_LARGEFILE is set and the size of the file cannot be represented correctly in an object of type off64_t.
Note that using open64() is equivalent to using open() with O_LARGEFILE set in oflag.Note: For the transitional extensions these changes to open() are in place of the changes described in 2.2.1.24 open() relating to the changes to the SUS.
blkcnt_t fsblkcnt_t fsfilcnt_t fpos_t ino_t off_t rlim_t
struct dirent struct flock struct rlimit struct stat struct statvfs
F_GETLK F_SETLK F_SETLKW RLIM_INFINITY RLIM_SAVED_MAX RLIM_SAVED_CUR
off64_t aio_offsetThe following are declared as functions and may be defined as macros.
int aio_read64(struct aiocb64 *aiocbp); int aio_write64(struct aiocb64 *aiocbp); int lio_listio64(int mode, struct aiocb64 *const list[], int nent, struct sigevent *sig); int aio_error64(const struct aiocb64 *aiocbp); ssize_t aio_return64(struct aiocb64 *aiocbp); int aio_cancel64(int fildes, struct aiocb64 *aiocbp); int aio_suspend64(const struct aiocb64 *const list[], int nent, const struct timespec *timeout); int aio_fsync64(int op, struct aiocb64 *aiocbp);
ino64_t d_ino file serial number.The following is declared as a function and may also be defined as a macro:
struct dirent64 *readdir64(DIR *dirp);
off64_t l_start relative offset in bytes. off64_t l_len size.Additional values for cmd used by fcntl():
F_GETLK64 Get record locking information using struct flock64. F_SETLK64 Establish a record lock using struct flock64. F_SETLKW64 Establish a record lock, blocking, using struct flock64.An additional file status flag, used by open() and fcntl(), is defined:
O_LARGEFILE The offset maximum in the open file description is the largest value that can be represented correctly in an object of type off64_t.The following are declared as functions and may also be defined as macros:
int creat64(const char *path, mode_t mode); int open64(const char *path, int oflag, ...);
int ftw64(const char *path, int (*fn)(const char *, const struct stat64 *, int), int ndirs); int nftw64(const char *path, int (*fn)(const char *, const struct stat64 *, int, struct FTW *), int depth, int flags);
fpos64_t Type containing all information needed to specify uniquely every position within a file in which the largest offset can be represented in an object of type off64_t.The following are declared as functions and may also be defined as macros:
int fgetpos64(FILE *stream, fpos64_t *pos); FILE *fopen64(const char *filename, const char *mode); FILE *freopen64(const char *filename, const char *mode, FILE *stream); int fseeko64(FILE *stream, off64_t offset, int whence); int fsetpos64(FILE *stream, const fpos64_t *pos); off64_t ftello64(FILE *stream); FILE *tmpfile64(void);
void *mmap64(void *addr, size_t len, int prot, int flags, int fd, off64_t offset);
rlim64_t type used for limit values.The type rlim64_t must be an extended unsigned arithmetic type that can represent correctly any non-negative value of an off64_t.
The following symbolic constants are defined:
RLIM64_INFINITY A value of type rlim64_t indicating no limit. RLIM64_SAVED_MAX A value of type rlim64_t indicating an unrepresentable saved hard limit. RLIM64_SAVED_CUR A value of type rlim64_t indicating an unrepresentable saved soft limit.On implementations where all resource limits are representable in an object of type rlim64_t, RLIM64_SAVED_MAX and RLIM64_SAVED_CUR need not be distinct from RLIM64_INFINITY.
The rlimit64 structure is defined in the same way as the rlimit structure in the Single UNIX Specification with the exception of the following members:
rlim64_t rlim_cur the current (soft) limit. rlim64_t rlim_max the hard limit.The following are declared as functions and may also be defined as macros:
int getrlimit64(int resource, struct rlimit64 *rlp); int setrlimit64(int resource, const struct rlimit64 *rlp);
ino64_t st_ino file serial number. off64_t st_size file size in bytes. blkcnt64_t st_blocks number of blocks allocated for this object.The following are declared as functions and may also be defined as macros:
int fstat64(int fildes, struct stat64 *buf); int lstat64(const char *, struct stat64 *buf); int stat64(const char *, struct stat64 *buf);
fsblkcnt64_t f_blocks total number of blocks in the file system in units of f_frsize. fsblkcnt64_t f_bfree total number of free blocks. fsblkcnt64_t f_bavail number of free blocks available to non-privileged process. fsfilcnt64_t f_files total number of file serial numbers. fsfilcnt64_t f_ffree total number of free file serial numbers. fsfilcnt64_t f_favail number of free file serial numbers available to non-privileged process.The following are declared as functions and may also be defined as macros:
int statvfs64(const char *path, struct statvfs64 *buf); int fstatvfs64(int fildes, struct statvfs64 *buf);
blkcnt64_t Used for file block counts. fsblkcnt64_t Used for file system block counts. fsfilcnt64_t Used for file system file counts. ino64_t Used for file serial numbers. off64_t Used for file sizes.The types blkcnt64_t and off64_t are defined as extended signed integral types.
The types fsblkcnt64_t, fsfilcnt64_t, and ino64_t are defined as extended unsigned integral types.
int lockf64(int fildes, int function, off64_t size); off64_t lseek64(int fildes, off64_t offset, int whence); int ftruncate64(int fildes, off64_t length); int truncate64(const char *path, off64_t length); Version Test Macros: _LFS_LARGEFILE is defined to be 1 if the implementation supports the interfaces as specified in 2.2.1 Changes to System Interfaces except that implementations need not provide the asynchronous I/O interfaces: aio_read(), aio_write(), and lio_listio(). _LFS_ASYNCHRONOUS_IO is defined to be 1 if the implementation supports the asynchronous IO interfaces: aio_read(), aio_write(), and lio_listio() as specified in 2.2.1 Changes to System Interfaces. _LFS64_ASYNCHRONOUS_IO is defined to be 1 if the implementation supports all the transitional extensions listed in 3.1.1.1.1 Asynchronous I/O Interfaces and 3.1.2.2 <aio.h>. _LFS64_LARGEFILE is defined to be 1 if the implementation supports all the transitional extensions listed in 3.1.1.1.3 Other Interfaces, 3.1.1.2 fcntl(), 3.1.1.3 open() and 3.1.2 Transitional Extensions to Headers, except changes specified in 3.1.2.2 <aio.h> and 3.1.2.6 <stdio.h> need not be supported. _LFS64_STDIO is defined to be 1 if the implementation supports all the transitional extensions listed in 3.1.1.1.2 STDIO Interfaces and 3.1.2.6 <stdio.h>. If _LFS64_STDIO is not defined to be 1 and the underlying file description associated with stream has O_LARGEFILE set then the behavior of the Standard I/O functions is unspecified. Constants for Functions: _CS_LFS_CFLAGS for confstr(). _CS_LFS_LDFLAGS for confstr(). _CS_LFS_LIBS for confstr(). _CS_LFS_LINTFLAGS for confstr(). _CS_LFS64_CFLAGS for confstr(). _CS_LFS64_LDFLAGS for confstr(). _CS_LFS64_LIBS for confstr(). _CS_LFS64_LINTFLAGS for confstr().
If -o largefiles is specified then there is no such guarantee.
The default behavior is implementation-dependent.
Example 1:
An example of compiling a program with a "large" off_t and that uses fseeko() and ftello() and uses yacc:
c89 -D_LARGEFILE_SOURCE -o foo \ $(getconf LFS_CFLAGS) y.tab.c b.o \ $(getconf LFS_LDFLAGS) \ -ly $(getconf LFS_LIBS)Example 2:
An example of compiling a program with a "large" off_t and that does not use fseeko() and ftello() and has no application specific libraries:
c89 $(getconf LFS_CFLAGS) a.c \ $(getconf LFS_LDFLAGS) \ $(getconf LFS_LIBS)Example 3:
An example of compiling a program with a "default" off_t and that uses fseeko() and ftello():
c89 -D_LARGEFILE_SOURCE a.cExample 4:
An example of compiling a program using transitional versions of SUS interfaces such as lseek64() and fopen64():
c89 -D_LARGEFILE64_SOURCE \ $(getconf LFS64_CFLAGS) a.c \ $(getconf LFS64_LDFLAGS) \ $(getconf LFS64_LIBS)Example 5:
An example of running lint on a program with a "large" off_t:
lint -D_LARGEFILE_SOURCE \ $(getconf LFS_LINTFLAGS) ... \ $(getconf LFS_LIBS)Example 6: An example of running lint on a program using the transitional API:
lint -D_LARGEFILE64_SOURCE \ $(getconf LFS64_LINTFLAGS) ... \ $(getconf LFS64_LIBS)These examples show the need for the additional variables LFS_CFLAGS, LFS_LDFLAGS, LFS_LIBS, LFS_LINTFLAGS, LFS64_CFLAGS, LFS64_LDFLAGS, LFS64_LIBS and LFS64_LINTFLAGS to be reported by getconf.
Implementations may permit the linking of object files that are compiled with differing off_t environments. For example, an object module compiled with a 32-bit off_t can be linked with an object module compiled with a 64-bit off_t. In such a case, both 32-bit off_t and 64-bit off_t API calls may be used on the same file descriptor. Implementations may instead disallow this linking.
Returning a "lie" to allow for common uses of a function (e.g. use of stat() to determine if a file exists) could inadvertently cause a correctly written application to operate incorrectly.
It is conceivable that returning a "lie" could keep an incorrectly written application from malfunctioning in a way that creates a serious problem, but no such applications are known to exist. (Of course it would be easy to contrive one.)
PASC Interpretation reference 1003.1-90 #38 completed by the POSIX.1 interpretations committee confirms that POSIX.1 conforming implementations are not allowed to lie to applications. This interpretation explicitly states that if the file size will not fit in an object of type off_t, fstat() must fail. In addition, PASC Interpretation reference 1003.1-90 #75 went on to clarify that EOVERFLOW would be a legal extension to report this condition.
The size of file on which a program is able to operate is determined by the off_t in use for the open(). The open protection rule ensures that old binaries do not operate on files that are too large to handle correctly, and prevents the binaries from generating incorrect results or corrupting the data in the file.
An argument against open protection is that requiring opens to fail will break some binaries that would have worked perfectly well otherwise. For example, a cat program does a loop of open(), read()/write() pairs, and close() for each input file. This program would unnecessarily break due to open protection. But this "Let it Run" argument is flawed in that there is no known utility which fails due to open protection but would work "perfectly well" if only we "let it run". Real versions of the cat program use fstat() to determine whether the input and output files are the same, have a -n option (count newlines) which will fail on sufficiently large files and so on.
Another argument against open protection is that it is unnecessary because an error will be returned as soon as a function cannot return the correct result of an operation ("No Lies" rule). However, most programs check for the success of the open() call, but many do not check for overflow or error after lseek() and other calls. An audit of the standard utilities uncovered numerous examples.
An argument for open protection is that it increases the likelihood of an immediate and informative error message. The error message is likely to include the name of the file that could not be opened. It is much less likely that an lseek() error message will be as immediate or as informative. The delay in, or complete lack of, reporting such errors may result in "silent failure".
Another argument for open protection is that there are numerous plausible scenarios in which this rule avoids serious harm. It prevents typical implementations of the touch utility from truncating large files to 0 length (see A.2.1.1.4 creat()). It can prevent silent failure, which has been demonstrated to occur in at least one commercial data management system. With open protection a commercial backup/restore system will report errors on files that might otherwise result in a corrupted backup tape. It prevents typical implementations of dbm/ndbm from returning incorrect results from a database whose size exceeds the off_t in use for the dbm routines.
There are two separate issues for this rule, which are that there is an application-dependent limit on read() and write(), and that the limit is "the offset maximum established in the open file description". The second issue is deferred to A.1.2.1 Offset Maximum. The first issue, that there be an application-dependent limit, is considered here.
There are two assertions upon which many applications rely:
The write limit avoids the unintuitive situation in which a program could create a file too large for it to open (due to open protection). This could result in a serious problem. "Can you imagine the reaction of someone who has 1.9G of data, and all of a sudden, the DBMS can no longer open the file? I wouldn't want to be working in tech support that day."
An argument for the write limit is that it keeps a program from creating a file too large for it to handle properly. An argument for the read limit is that it is a simple way to cover the hole where a file grows after it is opened.
An argument for the read/write limit rule is that generating an error at this limit provides the earliest possible warning of an incompatibility problem that could result in lost or corrupted data if the application was to continue.
An argument against the read/write limit rule is that it results in unnecessary breakage of binaries that would have worked perfectly well otherwise. This is the "Let it Run" argument, but as noted earlier few if any such programs exist.
Another argument against the read/write limit rule is that implementing it is expensive and complex. But it has already been implemented and found not to be either expensive or complex (an analysis appears in A.1.2.1 Offset Maximum).
Another argument against the read/write limit rule is that it can result in a truncated log file record (hence corrupting the log file). But this truncation and corruption can also occur due to insufficient disk space or RLIMIT_FSIZE, and indeed the standards require that this occur.
Another argument against the read/write limit rule is that instead one can use the existing file size resource limit (RLIMIT_FSIZE). But this is not a useful defense in a mixed off_t environment because it unnecessarily restricts the size of files created by programs which support a larger off_t. The practical effect will be that use of RLIMIT_FSIZE in this way will inconvenience users and they will unlimit themselves and then there will be no write limit. So this is a false, although attractive, argument.
Another argument against the read/write limit rule is that instead there can be a mount option which limits the maximum size of a file created in the file system. But regardless of other merits for such an option, it does not provide a useful defense in a mixed off_t environment because it unnecessarily restricts the size of files created by programs which support a larger off_t. The practical effect will be that the system administrator will be pressured into remounting the file system with no limit and then there will be no write limit. So this is another false, although attractive, argument.
The offset maximum is an unusual part of this specification as it is associated with the file description whereas in all other cases the limit is determined by the size of the type that is used for the call. But determining the latter for read/write would be extremely difficult in an environment in which a single process contains calls with differing sizes of off_t in use (this environment is not part of this section of the specification, but it is part of the transitional specification). In such an environment it would be necessary to determine the size of off_t for every function that might result in a read() or write(). That would include putchar(), fwrite(), fputs(), fprintf(), puts(), etc. The number of the routines that might potentially do a read() or write() is too large for such an implementation to be practical.
It is possible that while a "small" application has a file open another application with a larger off_t can extend the file beyond the size of the small application's off_t. This leads to a situation where the small application has a file descriptor which refers to a file too large for it to be able to process correctly. That is, open protection has been lost. The application will still have some protection due to "No Lies" and the "Read/Write Limit", but these are less effective protections. It is believed that this case is sufficiently unlikely that it may be safely ignored.
As an added protection, it has been suggested that all file calls should fail whenever the size of the file cannot be represented correctly in an object of type off_t. This would defend against the file growth scenario described above. But checking file size on each read/write might hurt performance in some cases and also it was not considered an important defense. It would also have the putchar(), fwrite(), etc. implementation problem.
It has been suggested that a file should not be permitted to be extended beyond the size of the smallest offset maximum in any open file description that refers to the file. It is believed that this is an unnecessary complication, cannot be enforced for some distributed file systems and applies only to a situation that it is believed may be safely ignored.
The value of the offset maximum in an open file description will not affect the semantics of operations related to other open file descriptions or of operations which create new open file descriptions, including other open file descriptions which refer to the same file.
An argument against offset maximum is that it is expensive and complex. But that is not the case. The only implementation that will matter for years is for 64-bit off_t which
<- "small" -> | <- "large" >- ---------- ----------------------- | b | b | ::: | b | B | L | L | L | ::: ^---^---^- -^---^---^---^---^---^- 0 1 2 2G 2G 2G -2 -1Although an lseek() can be done to the 2G-1 offset, a read() or write() cannot be performed at that position because when B (counting number 2G, but offset 2G-1) is read or written, the resulting pointer to the next offset address and the file size itself would overflow.
An application can inherit, via the exec family of functions, a file descriptor that is associated with a file whose size exceeds the largest value that can be represented correctly by the off_t that is in use by the application. An example is if a shell that was compiled with a 64-bit off_t does input or output redirection of a 10 gigabyte file and then executes a program which was compiled with a 32-bit off_t. In such a case the large file unaware application will function until attempting an operation from which the results cannot be correctly returned.
Most inherited files are due to shell redirection, the other cases are rare and typically under the complete control of a single application provider. The cases that are of primary concern are:
old_binary < large_fileand
old_binary > large_fileIn these cases a pre-existing application binary, old_binary, is given a file descriptor to a file that it would not have been able to open for itself and would be able to read and write past the limit that would have been established by the open(). The concern is that the application will do something destructive or generate incorrect results since it is not expecting a file to be so large.
In comparison, consider the following cases:
a.out | old_binaryand
old_binary | a.outThere is no limit to the amount of data that may be passed through a pipe. In the first case the application named a.out may push more data through the pipe than can be contained in a small file. In the second case a.out may be willing to read more data than can be contained in a small file. If a pre-existing application binary has problems with inherited file descriptors that refer to large files then it is likely to have a pre-existing problem when using a pipe for large amounts of data. While it is true that the two sets of cases are not completely equivalent, the above examples show that pre-existing binaries have had the potential to see data streams larger than the amount of data that can be contained in a small file.
Another reason it is believed that the inheritance of file descriptors does not cause problems is that the majority of existing applications do not perform seek operations on standard input or standard output.
The NFS version 2 protocol is effectively a 32-bit application since it cannot handle file sizes larger than 2^31-1 bytes. Any attempt by an NFS V2 client to access a large file (read(), write(), stat(), etc.) should be rejected by the server since the server knows the file is large and knows the application (NFS V2) is not "large file aware". This test is trivial and requires no more performance penalty than the tests for any other file system type.
The NFS version 3 protocol is "large file aware" since it can handle file sizes up to 2^63-1 bytes. An NFS V3 server would handle all requests without change, even if the request involves a large file. It is up to the NFS V3 client code to determine if the application accessing a file is "large file aware" or not. This should be handled in the standard fashion in the OS on the client side machine using the attributes returned by the NFS operation or the cached file attributes. While this does not provide perfect protection or immediate detection of files that have grown beyond 2^31-1 bytes since being opened, it is no more broken than the rest of NFS. (See below for more discussion of cached file attributes).
This does not address the issue of NFS V3 clients that are not prepared to handle "large files". If they are carefully written and obey the NFS V3 protocol they should realize that files can be larger than 2^31-1 bytes and handle this condition appropriately, probably by failing the operation (they would know this when a stat(), read(), write(), etc. operation returned a file size larger than 2^31). However, there are probably NFS V3 clients that are not carefully written. We really can't do much about that.
Cached Attributes: with the NFS V3 protocol, clients are not required to cache the file attributes, and servers are not required to return the file attributes with each operation. If the file attributes are returned with each operation, it is easy to determine if the file has grown past the large file limit. If not, the cached attributes can be consulted.
If the client does not cache attributes, then it will either have to request the attributes from the server over the wire (adversely affecting performance) or assume the file has not grown in size since it was opened. This specification pretty much requires the client code to check the file size at open.
Because of the stateless nature of NFS, it is difficult to ensure that a large-file unaware application cannot operate on a file that has grown from small to large. This is for the same reasons that NFS cannot implement standard UNIX file semantics. However, it is easy to ensure that a large-file unaware application does not grow a small file to become large (since the offset and length of each write are determined at the client, the client can fail any operation where the offset plus length exceeds the small file limit). It is also easy to insure that a large-file unaware application does not read past the small file limit.
if (stat(path, ...) < 0) { /* assume file does not exist, so create it */ if ((fd = creat(path, ...)) < 0) { /* print out error text */ } }In this example the stat() function is being used to determine the existence of a file. But if the file size cannot be represented correctly in an object of type off_t then stat() will fail (see 2.2.1.14 fstat(), lstat() and stat()) and if creat() did not then fail it would have the unintended effect of truncating the file to 0 length. Many applications and standard utilities have code similar to this example, including typical implementations of the touch utility.
Several existing implementations of fcntl() permit locking the byte whose offset is the maximum value that can be represented correctly in a object of type off_t, even though write() cannot write to that offset. This specification permits that behavior.
The fcntl() function will fail if the cmd argument is F_GETLK and the first lock which blocks the lock description has a starting offset or length which cannot be represented correctly in an object of type off_t. Information about such a lock cannot be correctly returned.
Discussion of the semantics of fcntl() locks that cross the off_t boundary resulted in six competing proposals:
An advantage of 2, 4, and 6 is that they do not change existing behavior of a 32-bit application.
Proposals 1 and 5 can result in a new type of failure in the case where the program creates a lock with l_len equal to 0 and then clips off the beginning leaving behind an unrepresentable lock.
Proposal 4 precludes truly "whole file" locking.
Proposal 6 was adopted because as it preserves existing 32-bit behavior and is less disruptive than proposal 2 (which extends lock requests in addition to unlock requests).
The fcntl() and lockf() functions will fail if the offset of the first byte in the region, or if l_len (size) is non-zero then the offset of last byte in the region, exceeds the largest possible value in an object of type off_t. Otherwise the process could create a lock which would be "beyond" the ability of the program to represent.
Programs typically, but incorrectly, fail to check the return value of these functions, which renders the error return less useful. On the other hand, returning an incorrect offset can result in serious malfunction as well.
An lseek() to the end of a file using
lseek(fd, 0, SEEK_END);is quite common. It is unfortunate that these fail on a too-large file since the return value is usually ignored. One alternative that was considered was for lseek() to move the file offset for all valid requests and then return an error if the resulting offset is too large. That is, the call would succeed for applications that do not check the return code, but also fail for applications that do check. This option was deemed too bizarre to adopt. For example, it might be difficult to implement using a remote procedure call system that was constructed to return either results or an error, but not both. In addition, the POSIX 1003.1 standard requires the file offset to remain unchanged if an error is returned by lseek(). It was felt that the open protection (see A.1.1.2 "Open Protection" Rule) and the read/write limit (see A.1.1.3 "Read/Write Limit" Rule) are more effective defenses against this problem.
Another potentially serious consequence of ignoring the return value of lseek() is that programs which extend data files by attempting to seek beyond the end-of-file and then writing may instead overwrite existing data.
For example, typical implementations of the dbm and ndbm libraries contain code such as:
(void) lseek(db->dbm_pagf, blkno*PBLKSIZ, L_SET); if (write(db->dbm_pagf, pagebuf, PBLKSIZ) != PBLKSIZ) ... error handling ...
The problem is that the return code of lseek() is not checked and so if "blkno*PBLKSIZ" overflows the lseek() will fail (or will seek to an unintended offset) and the data will be written to an unintended offset.
The _PC_FILESIZEBITS option makes it possible for a process to determine how large a file can be created in a given directory. It takes into account implementation limitations in the file system (e.g. due to the size of file size and block count variables), and it takes into account long term policy limitations (e.g. due to the mount utility's -o nolargefiles option). It does not take into account dynamic restrictions such as the RLIM_FSIZE resource limit or the number of available file blocks, so the process must perform appropriate checks.
When the current directory is on a typical large file capable file system and is mounted with the -o nolargefiles option,
pathconf(".", _PC_FILESIZEBITS);will return 32. In general, if the maximum size file that could ever exist on the mounted file system is maxsize then the returned value is 2 plus the floor of the base 2 logarithm of maxsize.
When ftruncate() is used to increase the size of a file, the semantics are similar to a write() of zeroes to the file. For consistency with write(), the ftruncate() function will fail when the request is beyond the offset maximum (even if the effect of the request would be to shorten the file).
If setrlimit() fails for any reason (for example, EPERM), the resource limits and saved resource limits remain unchanged.
This proposal does not specify any particular value for RLIM_INFINITY, RLIM_SAVED_MAX or RLIM_SAVED_CUR. Typical current implementations use the value 0x7FFFFFFF for RLIM_INFINITY, and it is recommended that RLIM_SAVED_MAX and RLIM_SAVED_CUR have similar large values.
Few, if any, programs will need to refer explicitly to RLIM_SAVED_MAX or RLIM_SAVED_CUR. Those that do should not use them in C-language switch cases since they may have the same value in some implementations (see 2.2.2.3 <sys/resource.h>).
A limit that can be represented correctly in an object of type rlim_t is either "no limit", which is represented with RLIM_INFINITY, or has a value not equal to any of RLIM_INFINITY or RLIM_SAVED_MAX or RLIM_SAVED_CUR and which can be represented correctly in an object of type rlim_t and which meets any additional implementation-specific criteria for correct representation.
A rejected alternative proposal was to map limits that could not be represented to and from RLIM_INFINITY. This would avoid the need for the new symbols RLIM_SAVED_MAX and RLIM_SAVED_CUR. But such mapping would arguably be a lie, and the resulting information loss would cause unintuitive program behavior, especially in programs running with appropriate privileges needed to raise hard limits.
A rejected alternative proposal was that if getrlimit() could not correctly return a current limit then it should instead return -1 and set errno to EOVERFLOW. But that would result in unnecessary breakage of programs. (Note that this breakage occurs even when no large files are present.) It would also result in malfunction of programs that assume that they are calling getrlimit() properly and so failure "cannot happen". For example, in the 4.4 BSD-Lite distribution, there are at least 15 unchecked calls to getrlimit(). When the 4.4 BSD csh limit function is used to report the current limits, there is no check of the return code and so the reported results can be entirely incorrect. Also, non-superuser programs typically unlimit themselves with:
getrlimit(RLIMIT_STACK, &rl); rl.rlim_cur = rl.rlim_max; setrlimit(RLIMIT_STACK, &rl);If the getrlimit() fails then garbage is passed to setrlimit() which may result in an unwanted and extremely restricted limit. Several utilities that are part of the GNU C compiler have this problem.
In addition, the program needs to be checked for file size related variables such as offsets, line numbers, and block counts that must be converted to a large off_t or related type. These variables typically appear inside loops that are performing input and/or output.
Vendor and third-party backup software is also unable to support large files and will require modification in order to do so.
Typical core utilities must be compiled in a "large" off_t compilation environment or must use the transitional APIs. Using the compilation environment reduces the number of editing changes required to port a program, but it does not reduce the effort required to ensure the correctness of the port.
The chgrp, chmod, chown, ln, and rm utilities probably require use of large file capable versions of stat(), lstat(), ftw(), and the stat structure.
The cat, cksum, cmp, cp, dd, mv, sum, and touch utilities probably require use of large file capable versions of creat(), open(), and fopen().
The cat, cksum, cmp, dd, df, du, ls, and sum utilities may require writing large integer values. For example,
The dd, find and test utilities may need to interpret command arguments that contain 64-bit values. For dd the arguments include skip=n, seek=n, and count=n. For find the arguments include -size n. For test the arguments are those associated with algebraic comparisons.
The df utility might need to access large file systems with statvfs().
The ulimit utility will need to use large file capable versions of getrlimit() and setrlimit() and be able to read and write large integer values.
Conversion between off_t (or other derived types) and ASCII is unspecified, which is a significant practical deficiency. This is being considered by other groups. For example, see: ftp://ftp.dmk.com/DMK/sc22wg14/c9x/extended-integers/
The offset maximum used for shell input and output redirections is implementation-specific. Some vendors prefer to use the smallest supported off_t, others prefer the largest.
fcntl(fd, F_SETFL, O_APPEND);This is incorrect because it turns off all the other open flags, including O_LARGEFILE. Instead, to turn on append mode one should first use F_GETFL to get the current flags:
int oflag = fcntl(fd, F_GETFL, 0);then include O_APPEND in the flags:
oflag |= O_APPEND;and then set the new flags:
fcntl(fd, F_SETFL, oflag);A more complete example would also check for fcntl() failures.
Since a new control block is needed, new interfaces are required for all of the existing aio interfaces since every one takes a pointer to the control block as an argument.
This macro does not affect the size of off_t (see 3.3.3 Mixed API and Compile Environments Within a Single Process).
This macro does not affect the size of off_t (see 3.3.3 Utilities: Optional Method for Specifying the Size of an off_t).
If _LARGEFILE64_SOURCE is defined then _LARGEFILE_SOURCE is implied so it need not also be defined (see 3.3.1 Compilation Environment - Visibility of Additions to the API). Similarly, if _LFS64_LARGEFILE is defined then _LFS_LARGEFILE will be defined so it need not also be tested.
Mixing the standard and transitional APIs is relatively safe, since data types have the same meaning in every file. This mixing permits a smoother and faster migration to a larger off_t environment, because it permits asynchronous upgrades. For example, it permits libraries to be made large file aware without requiring large file awareness in all the programs which use the library or in all the libraries which the library uses. (This is true both for static and for shared libraries.) This is particularly beneficial for situations in which the system vendor, one or more third-party suppliers, and the end user may all be supplying libraries or other objects that are components of a complete program.
If the size of off_t is controlled by a preprocessor macro variable then it is recommended that the macro be named _FILE_OFFSET_BITS and be supported as follows:
For POSIX compatibility this method must not be affected by the #undef preprocessor or directive. For example:
#undef lseekmust not alter the size of type off_t in use for a call to lseek().
The functions that might be affected by this option are listed in 3.1.1.1 64-bit Versions of Interfaces.
The types, structures and symbolic constants that might be affected by this option are listed in 3.1.2.1 64-bit Versions of Headers.
It has been argued that there should be a new mode bit (or "magic number") on executable images to indicate whether or not the application is large file aware. This is not precluded by this specification. However, an argument against it is that it requires significant work. Specifically, kernel, compiler, loader, and library changes are needed. It is unclear how the mode bit would support a large file aware application that makes calls to a non-aware shared library.