Finer details of Fortran

Some of the finer details of Fortran

Fortran is well described in a number of books, but not all aspects of it will become clear just from reading and studying them. This page is intended to highlight a few details that can be easily overlooked and to describe a few styles of programming that may be unfamiliar to most Fortran programmers.

Compiler extensions and limitations
Modules and generic source code
Functional programming
Using work arrays
Another look at generic source code
Read-only components in a derived type Read-only variables The singleton pattern A little known aspect of direct access files

Compiler extensions and limitations

The Fortran standard specifies the Fortran language in detail but it also intentionally leaves a number of things up to the compiler writers. Furthermore, compiler writers can add options or features that are not covered by the standard or that relax some restrictions.

This section is intended to list a few of the aspects that can cause surprises.

First of all, not all compilers implement the complete standard. The features listed below are not supported by at least one compiler:

Using character strings with a "dynamic" length in subroutines and functions:

    function double_string( string ) result(r)
        character(len=*)            :: string
        character(len=2*len(string) :: r

        r = string // string
    end function

Allocatable elements in derived types. This is an official extension to the Fortran 95 standard (Technical report ISO/IEC 15581) and is part of Fortran 2003. It is implemented with many current compilers. For example:
```
    type grid
        real, allocatable, dimension(:,:) :: x, y
    end type
```
If you do not want to rely on this, use the pointer attribute instead.

Then there are a number of extensions - features that are accepted by some compilers, but that would flag errors with others:

Use of character strings of different length in array constructions:
```
    ingredients = (/ 'eggs', 'butter', 'flour', 'salt', 'water' /)
```
Strictly speaking, strings of different length have different types. Always use strings with the same length (this also applies to the merge() function).
Some compilers allow .t. and .f. as alternatives to .true. and .false. This means that you can not use .t. and .f. as user-defined operators.
Quite from the question whether such short names are wise, do not use .t. or .f. for this purpose.
Reading integer numbers with list-directed input (*) may not always return an error when the input is not a proper integer:
```
    integer :: ivalue
    character(len=20) :: string
    string = '12.34'
    read( string, * ) ivalue
```
may fail with one compiler but happily succeed with another.
Conclusion: if you need to check whether a string is a valid representation of an integer (or a real), do not rely on list-directed input (see: Checking integer values).

Modules and generic source code

The FLIBS collection of Fortran source files contains several files that are intended for generic purposes, most notably the files in the datastructures directory.

While "true" genericity may not be possible with the current standard (Fortran 90/95), we can achieve a lot with just a few devices that are available:

The INCLUDE allows us to incorporate pieces of code in a source file at compile-time. This can be used to avoid replication of these fragments of source code.
Putting the source in modules makes it possible to selectively import the items we need under names that are convenient within the context where we want them.

Both these techniques are used extensively in the examples/test programs. (I even wrote an article for the Fortran Forum journal using these techniques to show how design patterns, well-known from the Java and C++ world, can be applied in a Fortran programs.)

Here is an example (from "test_queue.f90"):

module MYDATA_MODULE

type MYDATA
    character(len=20) :: string
end type MYDATA

end module

module MYDATA_QUEUES
    use MYDATA_MODULE, QUEUE_DATA => MYDATA

    include "queues.f90"
end module MYDATA_QUEUES

The code in the file "queues.f90" is completely independent of the type of data we want to store in the queues.

The main limitation of this style of genericity is that you can only store data of the same type in such queues. When you need queues for more than one data type, you can follow the method of example "two_lists.f90" in the tests directory.

Another limitation is that you can only store derived types in this way - basic types like "INTEGER" or "REAL" can not be renamed (and the syntax gets in the way: type(QUEUE_DATA) versus REAL for instance.)

Whether these limitations are prohibitive, depends entirely on the problem at hand: if you do need to store data of different types in a single "container", then you also need some way of identifying the type. And type safety is becoming an issue - will you always use the contained data in the right way (for instance pass it to routines that can handle such data)?

In principle it is possible to store data of different types in a single container like the queues above: via the TRANSER() function. But if that can be avoided, your program will probably be simpler to understand.

Importing from modules

Combining interfaces

Character strings with arbitrary lengths

Returning a pointer?

Using work arrays

Often you need a work array of a size related to the size of the problem that you are trying to solve, for instance the transpose of a matrix or a few arrays that hold the intermediate steps in a numerical integration method for systems of differential equations.

As long as the work arrays are "small", you can use automatic arrays, but the problem is that these can cause a stack overflow - resulting in an abrupt abortion of the program without a chance to catch it.

The alternative is to allocate an array (alocatable or pointer) of the appropriate size, but that has a performance penalty - memory allocation is time-consuming. You also need to take care that the memory is deallocated again (though in Fortran 95 this is automatically taken care of by the compiler under certain circumstances - allocatable arrays that do not have the target or the save attribute). The advantage, however, is that runtime errors can be caught, so that you can write a decent error message to the screen or log file.

Can these two approaches be combined? Yes, they can, as the following example shows:

subroutine mkwork( size_problem )
    integer :: size_problem
    real, dimension(min(100,size_problem), target :: work1
    real, dimension(:), pointer                   :: work

    !
    ! Decide which of the two to use
    ! (really requires Fortran 95, because of the
    ! status of the pointer)
    !
    if ( size_problem <= 100 ) then
        work => work1
    else
        allocate( work(1:size_problem) )
    endif

    ... Do the job

    !
    ! Free the work array, when it was allocated
    !
    if ( .not. associated( work, work1 ) ) then
        deallocate( work )
    endif
end subroutine mkwork

Another way to solve the problem, is simply to use allocatable/pointer arrays with the save attribute:

subroutine mkwork( size_problem )
    integer :: size_problem
    real, dimension(:), allocatable, save :: work

    if ( .not. allocated(work) ) then
        allocate( work(size_problem) )
    else
        if ( size_problem >= size(work) ) then
           deallocate( work )
           allocate( work(size_problem) )
        endif
    endif

    ... Do the job

    !
    ! We do not free the memory!
    !
end subroutine mkwork

That way you keep the cost of memory allocation down to a minimum: you simply allocate a new array when encountering a problem of a larger size. Any smaller problem can be solved with the previously allocated memory.

The drawback is that you have a small memory leak to worry about, but it is limited as there is only one chunk of memory per subroutine.

Measuring performance

Distinguising "missing values"

The perils of double precision

Quantum computing

Functional programming

One well-known feature that Fortran 90 introduced to the Fortran language is the ability to handle arrays as a whole, instead of merely via their elements:

real, dimension(1:100) :: a, b

b = 2*a

This is in fact a simple form of functional programming. In languages like LISP, Haskell or ML the fundamental data type for a collection of data is the list. While not at all equivalent to an array in Fortran, these lists are often used via operations, such as map and filter that are quite easy to simulate in Fortran, or are simply present in another guise:

The map operation applies a function to each element of a list. This is hardly different from the fragment above - the map operation is applied to arrays instead of lists.
The standard function pack in Fortran allows you to filter out unwanted elements from an array based on a logical criterium. This is completely analoguous to the filter operation in Haskell or ML.

Here are two practical examples of a functional programming style in Fortran:

Filtering out data points
In a library that deals with spatial interpolation the data points that are passed in may contain "missing values". The interpolation methods are however much easier to implement if there are no missing values. So, this library contains code like:

   real, dimension(:), intent(in)  :: x, y, value
   logical, dimension(1:size(x))   :: accepted

   real, dimension(:), allocatable :: xf, yf, valuef ! Filtered data points

   accepted = value .ne. -999.0
   number_data = count(accepted)

   allocate( xf(1:number_data), yf(1:number_data), valuef(1:number_data) )

   xf     = pack( x,     accepted )
   yf     = pack( y,     accepted )
   valuef = pack( value, accepted )

The alternative is to use an explicit loop:

   j = 0
   do i = 1,size(x)
      if ( accepted(i) ) then
         j = j + 1
         xf(j)     = x(i)
         yf(j)     = y(i)
         valuef(j) = value(i)
      endif
   enddo

Clearly this is lengthier and easy to get wrong.

Finding the closest point
In that same library it is sometimes necessary to find the data point that is closest to the target point (to avoid extrapolation). Here is a fragment that will do that:

   real, dimension(:), intent(in)  :: x, y, value
   real                            :: xtarget, ytarget
   integer, dimension(1)           :: idx

   idx = minloc( (x - xtarget) ** 2 + (y -ytarget) ** 2 )

   write(*,*) 'Closest point: ', x(idx(1)), y(idx(1))

(The only unfortunate thing about this is the fact that we have to declare a short array idx instead of an ordinary variable). Compare this to the alternative:

   real, dimension(:), intent(in)  :: x, y, value
   real                            :: xtarget, ytarget
   real                            :: dist, distmin
   integer                         :: idx

   idx = -1
   do i = 1,size(x)
      dist = (x(i) - xtarget) ** 2 + (y(i) -ytarget) ** 2
      if ( idx .eq. -1 .or. dist .lt. distmin ) then
         idx = i
         distmin = dist
      endif
   endif

   write(*,*) 'Closest point: ', x(idx), y(idx)

We have an explicit do-loop and we need to take care of the initial case, done here by a special (impossible) value for the target.

Checking integer values

Here is a small function to reliably check if a string is a valid representation of a single integer value:

logical function okay(string)
    character(len=*) :: string
    character(len=20)          :: form
    integer                    :: ierr
    integer                    :: ival

    write( form, '(a,i0,a)' ) '(i', len(string), ')'
    read( string, form, iostat = ierr ) ival

    okay = ierr == 0

end function okay

As discussed above, it is not enough to read the string with list-directed input (the very powerful "read(...,*)", you need to use formatted input.

If the string contains several integers, like "12, 34, ...", you can check the first value like this:

logical function okay(string)
    character(len=*)            :: string
    character(len=len(string))  :: first
    character(len=20)           :: form
    integer                     :: ierr
    integer                     :: ival

    okay = .false.
    read(string, *, iostat = ierr ) first

    if ( ierr == 0 ) then
        write( form, '(a,i0,a)' ) '(i', len(string), ')'
        read( first, form, iostat = ierr ) ival
    endif

    okay = ierr == 0

end function okay

Another look at generic source code

One vexing problem of modern programming practice is that type safety and flexibility do not always go together. Here is one such dilemma:

Solve a system of ordinary differential equations:

    x'  =  f(t,x,a),  t = 0,...,T

    x(t=0) = x0

where x is a vector of N dependent variables, f a function describing the derivative of x w.r.t. time and a is a set of parameters.

We would like to make a general library for solving this type of systems. The start is simple: Define a routine that takes a parameter f to represent the right-hand side (the name of a function subprogram) and the routine can be any convenient numerical method.

But then we run into problems: how to represent the set of parameters a? For one problem an array of reals might do, for another we would need a more complicated data structure.

The numerical method is independent of this parameter - just pass it around. But Fortran does not allow us to define a generic type. Mucking about with the transfer() function is one solution but you can not hide it from the user then.

Another problem, perhaps more mundane, is that we would like to print (output, save) intermediate results. But that leads to a second routine we would like to pass to the generic, mathematical algorithm.

Can we solve this problem in a more general, indeed generic way? Yes, we can: using reverse communication. But that leads to a complicated interface:

We need to initialise everything for the solution process
In a loop we call the stepwise solution procedure and examine the action we are expected to take
We compute the derivative or we print the intermediate result
Finally, we have the end result and return

It gives us full flexibility in implementing the computation of the derivative and the printing of the intermediate result, we do not have to interfer with the library of numerical algorithms in any way.

But the user does have to program this loop. And we would like to hide the gory details.

This is actually possible and here is a sketch of how to do that:

Define a useful interface for the function f:

     function f( t, x, a ) result( r )
         real(kind=wp)                     :: t
         real(kind=wp), dimension(:)       :: x
         type(params)                      :: a
         real(kind=wp), dimension(size(x)) :: r
     end function

Define one for the printing routine:

     subroutine print( t, x, a )
         real(kind=wp)                     :: t
         real(kind=wp), dimension(:)       :: x
         type(params)                      :: a  ! a stores the LU-number as well
     end subroutine

Define a solution method (integrate from t0 to t1):

     subroutine solve( f, t0, t1, x, a )
         use solution_library
         interface
             function f( t, x, a ) result( r )
                 use params_module
                 real(kind=wp)                     :: t
                 real(kind=wp), dimension(:)       :: x
                 type(params)                      :: a
                 real(kind=wp), dimension(size(x)) :: r
             end function
         end interface

         real(kind=wp)                     :: t0
         real(kind=wp)                     :: t1
         real(kind=wp), dimension(:)       :: x  ! Initial condition at
                                                 ! start, final result at end
         type(params)                      :: a  ! a stores the LU-number as well

         include 'generic_part.inc'

     end subroutine solve

The generic part in the file "generic_part.inc" is not entirely independent of the structure type(params), which is why we need the source code, but we do not need to change anything if it changes:

      !
      ! Sketch only
      !
      type(solution_struct)              :: solution  ! Required for bookkeeping

      call init_solution( solution, t0, t1, x )
      t = t0

      do
           call step_solution( solution, t, x, task )

           select case( task )
               case( print )
                   call print( t, x, a )
               case( deriv )
                   x = f( t, x, a )
               case ( completed )
                   exit
           endselect
      enddo

The user of the library does not have to change this generic part and the author of the library does not have to deliver any other source code.

Read-only components in a derived type

Fortran 90/95 allows components (fields) of a derived type to be either all public or all private. If they are public, the program that uses such a type can do whatever it likes with the components. But what if you want read-only components? So, components that a program can use directly but not change.

You can of course use a "get function to get the value of the component:

    type personal_data
        private
        integer :: year_of_birth
        ...
    end type

    integer function year_of_birth( data )
        type(personal_data) :: data

        year_of_birth = data%year_of_birth

    end function

This can be cumbersome in computations, as accessing the data has a different form than using them directly: year_of_birth(data) versus data%year_of_birth.

With a bit more work (but that can be made general enough), you can create read-only components (note that all direct access to the read-only integer is excluded, unless the user explicitly uses the read_only_types module:

    use read_only_types

    private :: set_ro_integer, ro_integer ! Imported from read_only_types, but they
                                          ! should not be passed on

    type personal_data
        type(ro_integer) :: year_of_birth
        ...
    end type

The module read_only_types looks like this:

module read_only_types

    type ro_integer
        private
        integer :: value
    end type

    !
    ! Interfaces to computational facilities
    !
    interface assignment(=)
        module procedure set_int_to_ro_int
    end interface

    ! etc. ...

contains
!
! To be used by trusted modules only
!
subroutine set_ro_integer( ro_value, value )
    type(ro_integer), intent(out) :: ro_value
    integer, intent(in)           :: value

    ro_value%value = value

end subroutine

!
! Publically available routines
!
subroutine set_int_to_ro_int( value, ro_value )
    integer, intent(out)         :: value
    type(ro_integer), intent(in) :: ro_value

    value = ro_value%value

end subroutine

! etc. ...

end module

With this technique, the user can freely use the component year_of_birth without being able to override the value, unless doen via the facilities offered by the encompassing personal_data type.

There is one caveat: you can not simply print the value of data%year_of_birth, as the contents of that component are private...

Here is a complete, slightly silly program that uses the above:

module read_only_types

    type ro_integer
        private
        integer :: value
    end type

    !
    ! Interfaces to computational facilities
    !
    interface assignment(=)
        module procedure set_int_to_ro_int
    end interface

    ! etc. ...

contains
!
! To be used by trusted modules only
!
subroutine set_ro_integer( ro_value, value )
    type(ro_integer), intent(out) :: ro_value
    integer, intent(in)           :: value

    ro_value%value = value

end subroutine

!
! Publically available routines
!
subroutine set_int_to_ro_int( value, ro_value )
    integer, intent(out)         :: value
    type(ro_integer), intent(in) :: ro_value

    value = ro_value%value

end subroutine

! etc. ...

end module

module personal_data_facilities

    use read_only_types

    private :: set_ro_integer, ro_integer ! Imported from read_only_types, but they
                                          ! should not be passed on

    type personal_data
        type(ro_integer) :: year_of_birth
    end type

contains
subroutine set_birth_year( data, year )
    type(personal_data) :: data
    integer             :: year

    call set_ro_integer( data%year_of_birth, year )

end subroutine
end module

program test_pd
    use personal_data_facilities

    type(personal_data) :: data
    integer :: year

    call set_birth_year( data, 2000 )

    year = data%year_of_birth

    write(*,*) 'Year: ', year
   !write(*,*) 'Year: ', data%year_of_birth ! Alas, this does not work

end program

Read-only variables

The above technique to implement read-only components in a derived type can be used for individual variables as well:

module read_only_variables

    type ro_integer
        private
        integer :: value
    end type

    !
    ! Interfaces to computational facilities
    !
    interface assignment(=)
        module procedure set_int_to_ro_int
    end interface

    ! etc. ...

    type(ro_integer) :: ro_variable = ro_integer(10) ! Our read-only variable

contains

!
! Publically available routines
!
subroutine set_int_to_ro_int( value, ro_value )
    integer, intent(out)         :: value
    type(ro_integer), intent(in) :: ro_value

    value = ro_value%value

end subroutine

! etc. ...

end module

program test_ro
    use read_only_variables

    integer :: value

    !
    ! This works:
    !
    value = ro_variable

    !
    ! But this does not:
    !
    value = 30
    ro_variable = value

end program

The singleton pattern

Design patterns are a very popular subject among people using object-oriented programming languages. One such pattern is the Singleton pattern: basically an object of which there can be only one. That one object can be accessed from any part in the program. It is useful to store data in that are in some way global.

As Fortran 90/95 does not support object oriented programming perse, is it impossible to use the Singleton pattern? Let us examine the properties of such singleton objects:

There is only one such object in the program
You can get access to it from anywhere in the program, by merely calling a convenient function that returns the handle to that object.
As it is, in all other respects, an ordinary object, you can retrieve data from it and store new data.

Now, aren't these exactly the properties one can ascribe to a module? Variables contained in the module are unique in the program, they can be accessed simply by using the module. You can not create more than one instance ... so, here we have the Singleton pattern de facto.

A little known aspect of direct access files

Direct access files in Fortran have a quirk: the unit in which the record length is specified may be a byte or it may be a word. That is: with some compilers the record can be four times as long as you think it is.

This means that direct access files require a bit more care than you might think (actually, there are or have been compilers that put a header at the top of the file to indicate the record length, so the situation is even worse, in its generality).

Let us focus on the more common situation though that the record length may or may not be expressed in bytes. Here is a small routine that determines the unit of record length in bytes:

   subroutine bytes_in_rec( bytes )
   implicit none
   integer     bytes

   character*8 string
   integer     i
   integer     ierr

   open( 10, status = 'scratch', access = 'direct', recl = 1 )
   do i = 1,8
       write( 10, rec = 1, iostat = ierr ) string(1:i)
       if ( ierr /= 0 ) exit
       bytes = i
   enddo

   close( 10, status = 'delete' )

   end subroutine bytes_in_rec

The routine simply tries to write as many characters (bytes) to a record of length 1 as possible. The maximum for which this is possible without a runtime error is taken to be the number of bytes in the unit of record length.

-- Arjen Markus, dd. february 2009