ABSTRACT: Establishing the extent and proportion of live biomass and dead shell in estuary mussel beds has been carried out on foot typically during intertidal periods at low water springs (LWS). Clearly, such periods are of short duration and estimations of biomass vs shell have necessarily to be done on a sampling basis to cover sufficient ground. (fig1)
Given the limited time available and patchy and non-uniform nature of the ground, there is potential for significant error. For larger beds, such foot mapping may need to extend over several tides, extending the project both in terms of time and cost. In a previous proof of concept project to address these issues it was successfully demonstrated that a lightweight UAV had the potential to provide a photographic overview of an entire 3 Ha mussel bed without user intervention.
However, the technical limitations of the UAV (short flight times) and of the post processing photo-stitching techniques deployed at the time (Hugin) suggested that further development was necessary to achieve an outcome which might usefully complement the data ordinarily collected by hand. Being a 2D process, no depth data was available and thus no direct assessment of biomass volume was possible.
In this new project, using better equipment, more accurate flight data, longer flight times and significant image overlap it is intended to demonstrate the validity of 'Structure from Motion' image processing techniques in real-world situations as a useful and practical means for extracting depth information and volume directly.
The Mission: To cover the planned survey area 2 missions were flown by a standard DJI Phantom 2V+ UAV equipped with the standard FC200 camera and Fisheye 5mm lens. Mission parameters are as shown in tbl 1 and as can be seen each was at different altitudes and number of tracks in order to meet area requirements and flight time limitations imposed by the platform. These parametric differences also provided the opportunity of validating the performance of the reconstructive process as evidenced by the the outputs exemplified by figs 7-9
Distance travelled (M)
tbl1 - Flight Mission parameters
The UAV tracks for the 2 missions superimposed on a Google Earth image. Ordinarily the GPS parameters embedded in the image EXIF data would not be precise enough for accurate tracking so the data was acquired from a separate on-board flight data recorder.
Two successive raw images taken along the track. Note the barrel distortion (an artifact of the fisheye lens) and low contrast due to very flat lighting and the natural characteristics of the sandy terrain. Feature detection and matching is rendered more difficult under these non-ideal conditions.
As above corrected for barrel distortion and contrast enhanced to improve feature detection and matching.
Structure from Motion: SfM derives from Human's perception of 3D structures by their relative movement.
This mechanism has been applied to photogrammetric 3D object reconstruction by identifying corresponding features such as corner points in and between overlapping 2-dimensional images as a camera moves through the environment.
The 3D reconstructive process implemented here is well known and originates from the work of ChangChang Wu and Michal Jancosek It is not the latest or fastest implementation but its binaries are freely available (although partly closed source), well documented and some useful explanatory material exists e.g. more recently. Since its publication as a research paper it has been commercialised with significant speed improvements.
'VisualSfM' by ChangChang Wu and CMP/MVS by Michal Jancosek. provide the essential programmes necessary for generating the camera poses, feature detection, sparse and detailed point clouds and various mesh reconstructions and imagery output. The feature detector is the Scale-Invariant Feature Tansform (SIFT). published in 1999 and patented in the US by the University of British Columbia. It exploits GPU multicore parallelism for feature detection, feature matching, and bundle adjustment achieving processing time improvements orders of magnitude better than with a CPU alone.
For this implementation a Dell XPS-8300 machine was used running Windows 10 equipped with 16Gb Ram, 4-core i7 3.4Ghz processor and NVIDEA GTX970 GPU (1664 cores). Even with this GPU support, typical processing times ran into hours so running without GPU support is unlikely to be a realistic option other than for the simplest of cases (few images).
Figs 7-8 show output from the first stage of the reconstructive process which are composite views of the missions showing the 170 camera positions (the coloured rectangles) superimposed over a sparse point cloud representing the ground. Note that the positions and orientations (poses) of the cameras are entirely computed by 3D triangulation from matching features within the image content and not from attached EXIF or GPS data. (tbl1).
The apparent curvature of the ground and camera planes is a consequence of incomplete or inaccurate lens distortion compensation. Fig9 is a ground projection of of pixel size (in effect the Ground Sampling Distance GSD) showing colour differences corresponding to the 2 missions which were flown at altitudes of 40M (left) and 50M (right).
Sparse point cloud and camera poses output from VisualSfM viewed from above.
A smoothly rendered part of the mesh. Note the colours of the points and mesh vertices are conveyed from the original image.
The same part of the mesh rendered with flat faces . Colouring of the mesh faces is accomplished by interpolating the colours of the adjoining points.
An Orthophoto derived from the point cloud
A perspective view of part of the 3D rendered mesh showing some depth features
Reliably discriminating object height variations of, for example 1cm, at a camera range of 50M (1 part in 5000: a reasonable minimum requirement for this surveying application) represents a significant technical challenge given the likely sub-optimum light conditions, ground subject definition and contrast, camera stability, motion blur and lens distortions.
Process outputs from the VisualSfM and CMP/MVS achieved so far are encouraging but suggest that much further work is still required. It is anticipated that improvements in source image quality and alternative processing algorithms may yield better results.
Estuarial Surveys carried out by the Devon and Severn Inshore Fisheries Conservation Authority (D&SIFCA) monitor the distribution and quantity of crab tiles to ensure that it is in accordance with local byelaws and codes of conduct. This is carried out every 4 years and until 2016 was implemented by local volunteers during Low Water Springs inter tidal periods using hand-held GPS receivers and paper recording of Tile Field coordinates and other physical parameters.
In 2016 the D&SIFCA decided to do an aerial survey which I undertook to do some preliminary pre-contract work for. In considering some of the Photogrammetric implications of using drone imagery for estuarial surveys, it became evident that the manpower tradeoffs created two significant problems; the management of large numbers of images and the subsequent visual screening of of the images for evidence of tiling and the potential for feature duplication caused by image overlap.
A demonstration custom GUI application 'IMAGIS' was created as a data management tool and interface to a Geographic Information System (GIS) but still needing the analysis to be carried out manually (mk1 eyeball) on an image by image basis. Given the significant survey area (about 350 Ha) and the several thousand images involved, photo-analysis demands a significant level of automation if the benefits of the Drone image acquisition are not to be squandered. Further research may yield a solution.
Since its original publication, this paper has been updated with some recent work on Sfm, in particular the use of 'Colmap' by Johannes Schönberger. Some initial results are presented here in the form of a sparse and dense reconstructions from an original set of images used for the VisualSfM project (as above) enhanced with a recently acquired set of images using a different camera, image resolution and lighting conditions. Figure 15 below shows in real time the reconstruction process with views of its sparse point cloud outputs at fig16/17 and the dense reconstruction at fig18, the image feature extraction and matching having already been carried out.
Acknowledgements are made to ChanChang Wu for ‘VisualSFM’, a GUI application for 3D reconstruction , Michal Jancosek for the multi-view reconstruction software ‘CMPMVS’ , the designers of ‘Meshlab’ an open source, portable, and extensible system for the processing and editing of unstructured 3D triangular meshes and the open source 3D graphics and animation software ‘Blender’ without all of which this project would not have been possible.