sh1r4s3@blog /2020/10/14/msync.html% ./show_motd
Hi there, netlander...

sh1r4s3@blog /2020/10/14/msync.html%

In this post I’d like to tell about how to validate the pointer in the Linux based system. Anyone can stumble upon SIGSEGV[1] :) But, what if I’ll tell you that it is possible to check whether a pointer is a valid one or not? There are few techniques to validate a pointer but in my opinion this particular one is the best one. Let me introduce you to the msync()[2] system call. This system call is in the Linux kernel since 1.3.21 and after Linux 2.4.19 it conforms to the POSIX standard [3].

So, with the help of this little old-timer it becomes possible to validate a pointer. Let’s see the description of this system call. This system call was designed to synchronize a file with a memory map, i.e. memory mapped with the mmap() [4] system call. It might be useful to invoke it before the munmap() [4] system call, to synchronize the file and memory before the memory page allocated by the kernel would be freed.

Let’s take a look to the actual code of the msync() from the Linux v5.8 [5]:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
	unsigned long end;
	struct mm_struct *mm = current->mm;
	struct vm_area_struct *vma;
	int unmapped_error = 0;
	int error = -EINVAL;

	start = untagged_addr(start);

	if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
		goto out;
	if (offset_in_page(start))
		goto out;
	if ((flags & MS_ASYNC) && (flags & MS_SYNC))
		goto out;
	error = -ENOMEM;
	len = (len + ~PAGE_MASK) & PAGE_MASK;
	end = start + len;
	if (end < start)
		goto out;
	error = 0;
	if (end == start)
		goto out;

In this snippet it tests whether the flags, which was passed to the function, are fine and that start address is aligned by the page size. If one of the conditions is false then it will return with EINVAL [[6]] error. Next, it calculates the end of the memory chunk from the len. Basically, it is the address after the last byte of the last page allocated for the memory region from start to start + len. After that it calculates the end address and performs the basic sanity checks with it. It would be worth to mention that for our purposes the only useful flag would be MS_ASYNC. This flag specifies that call to the msync() returns immediately and an update will be scheduled. Now, let’s continue.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
	mmap_read_lock(mm);
	vma = find_vma(mm, start);
	for (;;) {
		struct file *file;
		loff_t fstart, fend;

		/* Still start < end. */
		error = -ENOMEM;
		if (!vma)
			goto out_unlock;

		if (start < vma->vm_start) {
			start = vma->vm_start;
			if (start >= end)
				goto out_unlock;
			unmapped_error = -ENOMEM;
		}
		// ...
		start = vma->vm_end;
		if ((flags & MS_SYNC) && file &&
				(vma->vm_flags & VM_SHARED)) {
		// ...
		} else {
			if (start >= end) {
				error = 0;
				goto out_unlock;
			}
			vma = vma->vm_next;
		}
	}
out_unlock:
	mmap_read_unlock(mm);
out:
	return error ? : unmapped_error;

This snippet is a quite complex, but there is no need to discuss all the branches in this code. At the beginning it locks the mm_struct and searches the first Virtual Memory Area (VMA) which should contain the start address (start < vma->vm_end). In the loop it will iterate over VMA regions until it validate that region from the start to the end is inside the VMA regions. Basically, it means that it checks whether this range is in the allocated virtual memory region.

With all that knowledge we just got there let’s write a function that will check whether the memory area is allocated by the process or not. Basically, in this function we need to align the pointer address by the page size and call the msync() system call. Here is a simple implementation:

1
2
3
4
5
6
7
8
9
10
int
valid_pointer(void *p, size_t len)
{
    // Get page size and calculate page mask
    size_t pagesz = sysconf(_SC_PAGESIZE);
    size_t pagemask = ~(pagesz - 1);
    // Calculate base address
    void *base = (void *)(((size_t)p) & pagemask);
    return msync(base, len, MS_ASYNC) == 0;
}

Below is an example of the validate_pointer() function use.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>

// Check the pointer
int
valid_pointer(void *p, size_t len)
{
    // Get page size and calculate page mask
    size_t pagesz = sysconf(_SC_PAGESIZE);
    size_t pagemask = ~(pagesz - 1);
    // Calculate base address
    void *base = (void *)(((size_t)p) & pagemask);
    return msync(base, len, MS_ASYNC) == 0;
}

int
main()
{
    // Map memory
    int *p = (int *)mmap(NULL,
                         sizeof(int),
                         PROT_READ | PROT_WRITE,
                         MAP_PRIVATE | MAP_ANONYMOUS,
                         -1,
                         0);
    // Check whether the mapping was successful
    if ((void *)p == MAP_FAILED)
    {
        puts("mmap failed");
        return 1;
    }
    // Check the pointer before we unmap the memory
    printf("check pointer before munmap... ");
    if (valid_pointer((void *)p, sizeof(p)))
        printf("valid\n");
    else
        printf("NOT valid\n");
    *p = 0x00c0ffee;
    // Unmap the memory
    if (munmap((void *)p, sizeof(int)))
    {
        puts("munmap failed");
        return 1;
    }
    // Check the pointer after we unmap the memory
    printf("check pointer after munmap... ");
    if (valid_pointer((void *)p, sizeof(*p)))
        printf("valid\n");
    else
        printf("NOT valid\n");
    // Produce SIGSEGV :)
    *p = 0xbaaaaaad;

    return 0;
}

A little update:

This example if just a toy model and should not be implemented as is. It is worth to mention that one should avoid to use this technique with malloc/free implementations [7] because they provide their own memory management system. Also, it needs to be used with caution in multi-threaded systems and in systems with high security requirements. One of the real implementation of the method is in the ROOT data analysis framework. It is used to check pointers in the C++ interpreter [8] . I hope that this information could be useful for someone :)